commit 5f2d0708acd0e1d2475d73c61819053de284bcc4 Author: Greg Kroah-Hartman Date: Fri Jun 21 14:38:50 2024 +0200 Linux 6.6.35 Link: https://lore.kernel.org/r/20240619125606.345939659@linuxfoundation.org Tested-by: Florian Fainelli Tested-by: Harshit Mogalapalli Tested-by: SeongJae Park Tested-by: Jon Hunter Tested-by: Allen Pais Tested-by: Kelsey Steele Tested-by: Mark Brown Tested-by: Takeshi Ogasawara Tested-by: Ron Economos Tested-by: Linux Kernel Functional Testing Tested-by: Peter Schneider  Tested-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit 3466abafa9f4f81a869da828e52a12f693175ec2 Author: Oleg Nesterov Date: Sat Jun 8 14:06:16 2024 +0200 zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING [ Upstream commit 7fea700e04bd3f424c2d836e98425782f97b494e ] kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true. That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop. Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov Reported-by: Rachel Menge Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/ Reviewed-by: Boqun Feng Tested-by: Wei Fu Reviewed-by: Jens Axboe Cc: Allen Pais Cc: Christian Brauner Cc: Frederic Weisbecker Cc: Joel Fernandes (Google) Cc: Joel Granados Cc: Josh Triplett Cc: Lai Jiangshan Cc: Mateusz Guzik Cc: Mathieu Desnoyers Cc: Mike Christie Cc: Neeraj Upadhyay Cc: Paul E. McKenney Cc: Steven Rostedt (Google) Cc: Zqiang Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin commit 2fd6cfb2a4e6ec13b97f8ae9323298aabcd7a515 Author: Jean Delvare Date: Fri May 31 11:17:48 2024 +0200 i2c: designware: Fix the functionality flags of the slave-only interface [ Upstream commit cbf3fb5b29e99e3689d63a88c3cddbffa1b8de99 ] When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities. Fixes: 5b6d721b266a ("i2c: designware: enable SLAVE in platform module") Signed-off-by: Jean Delvare Cc: Luis Oliveira Cc: Jarkko Nikula Cc: Andy Shevchenko Cc: Mika Westerberg Cc: Jan Dabros Cc: Andi Shyti Reviewed-by: Andy Shevchenko Acked-by: Jarkko Nikula Tested-by: Jarkko Nikula Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin commit 572afd43c959f44b59a5ba268c57125f09d2fbe5 Author: Jean Delvare Date: Fri May 31 11:19:14 2024 +0200 i2c: at91: Fix the functionality flags of the slave-only interface [ Upstream commit d6d5645e5fc1233a7ba950de4a72981c394a2557 ] When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities. Fixes: 9d3ca54b550c ("i2c: at91: added slave mode support") Signed-off-by: Jean Delvare Cc: Juergen Fitschen Cc: Ludovic Desroches Cc: Codrin Ciubotariu Cc: Andi Shyti Cc: Nicolas Ferre Cc: Alexandre Belloni Cc: Claudiu Beznea Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin commit a4cd6074aed688a524758809aa351151481a4da7 Author: Yongzhi Liu Date: Thu May 23 20:14:34 2024 +0800 misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe() [ Upstream commit 77427e3d5c353e3dd98c7c0af322f8d9e3131ace ] There is a memory leak (forget to free allocated buffers) in a memory allocation failure path. Fix it to jump to the correct error handling code. Fixes: 393fc2f5948f ("misc: microchip: pci1xxxx: load auxiliary bus driver for the PIO function in the multi-function endpoint of pci1xxxx device.") Signed-off-by: Yongzhi Liu Reviewed-by: Kumaravel Thiagarajan Link: https://lore.kernel.org/r/20240523121434.21855-4-hyperlyzcs@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit 2cc32639ec347e3365075b130f9953ef16cb13f1 Author: Shichao Lai Date: Sun May 26 09:27:45 2024 +0800 usb-storage: alauda: Check whether the media is initialized [ Upstream commit 16637fea001ab3c8df528a8995b3211906165a30 ] The member "uzonesize" of struct alauda_info will remain 0 if alauda_init_media() fails, potentially causing divide errors in alauda_read_data() and alauda_write_lba(). - Add a member "media_initialized" to struct alauda_info. - Change a condition in alauda_check_media() to ensure the first initialization. - Add an error check for the return value of alauda_init_media(). Fixes: e80b0fade09e ("[PATCH] USB Storage: add alauda support") Reported-by: xingwei lee Reported-by: yue sun Reviewed-by: Alan Stern Signed-off-by: Shichao Lai Link: https://lore.kernel.org/r/20240526012745.2852061-1-shichaorai@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit 3a03ef31c1e953bfd505b0e574df70194bbb7696 Author: Andy Shevchenko Date: Tue May 14 22:05:53 2024 +0300 serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw [ Upstream commit 87d80bfbd577912462061b1a45c0ed9c7fcb872f ] The container of the struct dw8250_port_data is private to the actual driver. In particular, 8250_lpss and 8250_dw use different data types that are assigned to the UART port private_data. Hence, it must not be used outside the specific driver. Currently the only cpr_val is required by the common code, make it be available via struct dw8250_port_data. This fixes the UART breakage on Intel Galileo boards. Fixes: 593dea000bc1 ("serial: 8250: dw: Allow to use a fallback CPR value if not synthesized") Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240514190730.2787071-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit 836e1a9fd8ebaad63a4f81be4e1df51b4ddb2553 Author: Andy Shevchenko Date: Wed Mar 6 16:33:22 2024 +0200 serial: 8250_dw: Replace ACPI device check by a quirk [ Upstream commit 173b097dcc8d74d6e135aed1bad38dbfa21c4d04 ] Instead of checking for APMC0D08 ACPI device presence, use a quirk based on driver data. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240306143322.3291123-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: 87d80bfbd577 ("serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw") Signed-off-by: Sasha Levin commit 1d98b6a0b90c1b04be6360d670ab3802d444e9fa Author: Andy Shevchenko Date: Mon Mar 4 14:27:08 2024 +0200 serial: 8250_dw: Switch to use uart_read_port_properties() [ Upstream commit e6a46d073e11baba785245860c9f51adbbb8b68d ] Since we have now a common helper to read port properties use it instead of sparse home grown solution. Reviewed-by: Andi Shyti Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240304123035.758700-8-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: 87d80bfbd577 ("serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw") Signed-off-by: Sasha Levin commit f59e2391d3a9ead402f27c3195cc18c571e4fc43 Author: Andy Shevchenko Date: Mon Mar 4 14:27:04 2024 +0200 serial: port: Introduce a common helper to read properties [ Upstream commit e894b6005dce0ed621b2788d6a249708fb6f95f9 ] Several serial drivers want to read the same or similar set of the port properties. Make a common helper for them. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240304123035.758700-4-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: 87d80bfbd577 ("serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw") Signed-off-by: Sasha Levin commit 68a53d1212ed492049629542990781639922a83e Author: Andy Shevchenko Date: Mon Mar 4 14:27:03 2024 +0200 serial: core: Add UPIO_UNKNOWN constant for unknown port type [ Upstream commit 79d713baf63c8f23cc58b304c40be33d64a12aaf ] In some APIs we would like to assign the special value to iotype and compare against it in another places. Introduce UPIO_UNKNOWN for this purpose. Note, we can't use 0, because it's a valid value for IO port access. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240304123035.758700-3-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: 87d80bfbd577 ("serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw") Signed-off-by: Sasha Levin commit 1006d1b5eb01e2dafe36b93ba7f3025c9ca89773 Author: Andy Shevchenko Date: Wed Oct 25 21:42:57 2023 +0300 device property: Implement device_is_big_endian() [ Upstream commit 826a5d8c9df9605fb4fdefa45432f95580241a1f ] Some users want to use the struct device pointer to see if the device is big endian in terms of Open Firmware specifications, i.e. if it has a "big-endian" property, or if the kernel was compiled for BE *and* the device has a "native-endian" property. Provide inline helper for the users. Signed-off-by: Andy Shevchenko Acked-by: Greg Kroah-Hartman Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20231025184259.250588-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: 87d80bfbd577 ("serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw") Signed-off-by: Sasha Levin commit dd431c3ac1fc34a9268580dd59ad3e3c76b32a8c Author: Stefan Berger Date: Fri Mar 22 10:03:12 2024 -0400 ima: Fix use-after-free on a dentry's dname.name commit be84f32bb2c981ca670922e047cdde1488b233de upstream. ->d_name.name can change on rename and the earlier value can be freed; there are conditions sufficient to stabilize it (->d_lock on dentry, ->d_lock on its parent, ->i_rwsem exclusive on the parent's inode, rename_lock), but none of those are met at any of the sites. Take a stable snapshot of the name instead. Link: https://lore.kernel.org/all/20240202182732.GE2087318@ZenIV/ Signed-off-by: Al Viro Signed-off-by: Stefan Berger Signed-off-by: Mimi Zohar Signed-off-by: Greg Kroah-Hartman commit 0b8fba38bdfb848fac52e71270b2aa3538c996ea Author: Sicong Huang Date: Tue Apr 16 16:03:13 2024 +0800 greybus: Fix use-after-free bug in gb_interface_release due to race condition. commit 5c9c5d7f26acc2c669c1dcf57d1bb43ee99220ce upstream. In gb_interface_create, &intf->mode_switch_completion is bound with gb_interface_mode_switch_work. Then it will be started by gb_interface_request_mode_switch. Here is the relevant code. if (!queue_work(system_long_wq, &intf->mode_switch_work)) { ... } If we call gb_interface_release to make cleanup, there may be an unfinished work. This function will call kfree to free the object "intf". However, if gb_interface_mode_switch_work is scheduled to run after kfree, it may cause use-after-free error as gb_interface_mode_switch_work will use the object "intf". The possible execution flow that may lead to the issue is as follows: CPU0 CPU1 | gb_interface_create | gb_interface_request_mode_switch gb_interface_release | kfree(intf) (free) | | gb_interface_mode_switch_work | mutex_lock(&intf->mutex) (use) Fix it by canceling the work before kfree. Signed-off-by: Sicong Huang Link: https://lore.kernel.org/r/20240416080313.92306-1-congei42@163.com Cc: Ronnie Sahlberg Signed-off-by: Greg Kroah-Hartman commit aefd8f343d90819cee799ee9d81508f831cedad0 Author: Matthieu Baerts (NGI0) Date: Wed Jun 5 11:21:17 2024 +0200 selftests: net: lib: avoid error removing empty netns name commit 79322174bcc780b99795cb89d237b26006a8b94b upstream. If there is an error to create the first netns with 'setup_ns()', 'cleanup_ns()' will be called with an empty string as first parameter. The consequences is that 'cleanup_ns()' will try to delete an invalid netns, and wait 20 seconds if the netns list is empty. Instead of just checking if the name is not empty, convert the string separated by spaces to an array. Manipulating the array is cleaner, and calling 'cleanup_ns()' with an empty array will be a no-op. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Cc: stable@vger.kernel.org Acked-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) Reviewed-by: Petr Machata Reviewed-by: Hangbin Liu Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-2-b3afadd368c9@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 44bdef23572ce1e4d4578b1a831d0ef1b988fdae Author: Matthieu Baerts (NGI0) Date: Wed Jun 5 11:21:16 2024 +0200 selftests: net: lib: support errexit with busywait commit 41b02ea4c0adfcc6761fbfed42c3ce6b6412d881 upstream. If errexit is enabled ('set -e'), loopy_wait -- or busywait and others using it -- will stop after the first failure. Note that if the returned status of loopy_wait is checked, and even if errexit is enabled, Bash will not stop at the first error. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Cc: stable@vger.kernel.org Acked-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) Reviewed-by: Hangbin Liu Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-1-b3afadd368c9@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 1d650d2c9bcc9c1fec4447b04ebf4057209048ad Author: Hangbin Liu Date: Tue May 14 10:33:59 2024 +0800 selftests/net/lib: no need to record ns name if it already exist commit 83e93942796db58652288f0391ac00072401816f upstream. There is no need to add the name to ns_list again if the netns already recoreded. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Signed-off-by: Hangbin Liu Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d722ed2530e1a17c4f31b510cdb72d1e02e6dbcf Author: Hangbin Liu Date: Wed Jan 24 14:13:44 2024 +0800 selftests/net/lib: update busywait timeout value commit fc836129f708407502632107e58d48f54b1caf75 upstream. The busywait timeout value is a millisecond, not a second. So the current setting 2 is too small. On slow/busy host (or VMs) the current timeout can expire even on "correct" execution, causing random failures. Let's copy the WAIT_TIMEOUT from forwarding/lib.sh and set BUSYWAIT_TIMEOUT here. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Signed-off-by: Hangbin Liu Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240124061344.1864484-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 8a73c08e00fe9fdb4a29e63d530709c64af74cae Author: David Howells Date: Fri Jan 19 20:49:34 2024 +0000 cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode commit c3d6569a43322f371e7ba0ad386112723757ac8f upstream. cachefiles_ondemand_init_object() as called from cachefiles_open_file() and cachefiles_create_tmpfile() does not check if object->ondemand is set before dereferencing it, leading to an oops something like: RIP: 0010:cachefiles_ondemand_init_object+0x9/0x41 ... Call Trace: cachefiles_open_file+0xc9/0x187 cachefiles_lookup_cookie+0x122/0x2be fscache_cookie_state_machine+0xbe/0x32b fscache_cookie_worker+0x1f/0x2d process_one_work+0x136/0x208 process_scheduled_works+0x3a/0x41 worker_thread+0x1a2/0x1f6 kthread+0xca/0xd2 ret_from_fork+0x21/0x33 Fix this by making cachefiles_ondemand_init_object() return immediately if cachefiles->ondemand is NULL. Fixes: 3c5ecfe16e76 ("cachefiles: extract ondemand info field from cachefiles_object") Reported-by: Marc Dionne Signed-off-by: David Howells cc: Gao Xiang cc: Chao Yu cc: Yue Hu cc: Jeffle Xu cc: linux-erofs@lists.ozlabs.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 3beccb6a326d1bfdd524bb78e761ba61720779ba Author: Beleswar Padhi Date: Mon May 6 19:48:49 2024 +0530 remoteproc: k3-r5: Jump to error handling labels in start/stop errors commit 1dc7242f6ee0c99852cb90676d7fe201cf5de422 upstream. In case of errors during core start operation from sysfs, the driver directly returns with the -EPERM error code. Fix this to ensure that mailbox channels are freed on error before returning by jumping to the 'put_mbox' error handling label. Similarly, jump to the 'out' error handling label to return with required -EPERM error code during the core stop operation from sysfs. Fixes: 3c8a9066d584 ("remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs") Signed-off-by: Beleswar Padhi Link: https://lore.kernel.org/r/20240506141849.1735679-1-b-padhi@ti.com Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman commit bb2f3187e3107d99c2900ff44d227a0dead74445 Author: Benjamin Poirier Date: Wed Jun 19 17:39:24 2024 +0800 selftests: forwarding: Avoid failures to source net/lib.sh commit 2114e83381d3289a88378850f43069e79f848083 upstream. The expression "source ../lib.sh" added to net/forwarding/lib.sh in commit 25ae948b4478 ("selftests/net: add lib.sh") does not work for tests outside net/forwarding which source net/forwarding/lib.sh (1). It also does not work in some cases where only a subset of tests are exported (2). Avoid the problems mentioned above by replacing the faulty expression with a copy of the content from net/lib.sh which is used by files under net/forwarding. A more thorough solution which avoids duplicating content between net/lib.sh and net/forwarding/lib.sh has been posted here: https://lore.kernel.org/netdev/20231222135836.992841-1-bpoirier@nvidia.com/ The approach in the current patch is a stopgap solution to avoid submitting large changes at the eleventh hour of this development cycle. Example of problem 1) tools/testing/selftests/drivers/net/bonding$ ./dev_addr_lists.sh ./net_forwarding_lib.sh: line 41: ../lib.sh: No such file or directory TEST: bonding cleanup mode active-backup [ OK ] TEST: bonding cleanup mode 802.3ad [ OK ] TEST: bonding LACPDU multicast address to slave (from bond down) [ OK ] TEST: bonding LACPDU multicast address to slave (from bond up) [ OK ] An error message is printed but since the test does not use functions from net/lib.sh, the test results are not affected. Example of problem 2) tools/testing/selftests$ make install TARGETS="net/forwarding" tools/testing/selftests$ cd kselftest_install/net/forwarding/ tools/testing/selftests/kselftest_install/net/forwarding$ ./pedit_ip.sh veth{0..3} lib.sh: line 41: ../lib.sh: No such file or directory TEST: ping [ OK ] TEST: ping6 [ OK ] ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth1 ingress pedit ip src set 198.51.100.1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth2 egress pedit ip src set 198.51.100.1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth1 ingress pedit ip dst set 198.51.100.1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth2 egress pedit ip dst set 198.51.100.1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth1 ingress pedit ip6 src set 2001:db8:2::1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth2 egress pedit ip6 src set 2001:db8:2::1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth1 ingress pedit ip6 dst set 2001:db8:2::1 [FAIL] Expected to get 10 packets, but got . ./pedit_ip.sh: line 135: busywait: command not found TEST: dev veth2 egress pedit ip6 dst set 2001:db8:2::1 [FAIL] Expected to get 10 packets, but got . In this case, the test results are affected. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Suggested-by: Ido Schimmel Suggested-by: Petr Machata Reviewed-by: Ido Schimmel Tested-by: Petr Machata Signed-off-by: Benjamin Poirier Reviewed-by: Hangbin Liu Link: https://lore.kernel.org/r/20240104141109.100672-1-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski Signed-off-by: Po-Hsu Lin Signed-off-by: Greg Kroah-Hartman commit 2a969959b94f796cd6bd4bad82de183c47afa432 Author: Hangbin Liu Date: Wed Jun 19 17:39:23 2024 +0800 selftests/net: add variable NS_LIST for lib.sh commit b6925b4ed57cccf42ca0fb46c7446f0859e7ad4b upstream. Add a global variable NS_LIST to store all the namespaces that setup_ns created, so the caller could call cleanup_all_ns() instead of remember all the netns names when using cleanup_ns(). Signed-off-by: Hangbin Liu Link: https://lore.kernel.org/r/20231213060856.4030084-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Po-Hsu Lin Signed-off-by: Greg Kroah-Hartman commit 04f7b9b4d7f884bc2fc38958f7721550aff50418 Author: Hangbin Liu Date: Wed Jun 19 17:39:22 2024 +0800 selftests/net: add lib.sh commit 25ae948b447881bf689d459cd5bd4629d9c04b20 upstream. Add a lib.sh for net selftests. This file can be used to define commonly used variables and functions. Some commonly used functions can be moved from forwarding/lib.sh to this lib file. e.g. busywait(). Add function setup_ns() for user to create unique namespaces with given prefix name. Reviewed-by: Petr Machata Signed-off-by: Hangbin Liu Signed-off-by: Paolo Abeni [PHLin: add lib.sh to TEST_FILES directly as we already have upstream commit 06efafd8 landed in 6.6.y] Signed-off-by: Po-Hsu Lin Signed-off-by: Greg Kroah-Hartman commit dd782da470761077f4d1120e191f1a35787cda6e Author: Sam James Date: Fri Jun 14 09:50:59 2024 +0100 Revert "fork: defer linking file vma until vma is fully initialized" This reverts commit cec11fa2eb512ebe3a459c185f4aca1d44059bbf which is commit 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 upstream. The backport is incomplete and causes xfstests failures. The consequences of the incomplete backport seem worse than the original issue, so pick the lesser evil and revert until a full backport is ready. Link: https://lore.kernel.org/stable/20240604004751.3883227-1-leah.rumancik@gmail.com/ Reported-by: Leah Rumancik Signed-off-by: Sam James Signed-off-by: Greg Kroah-Hartman commit 72b5c7f3b358ceb45d189a339a2fe9321f2375fd Author: Doug Brown Date: Sun May 19 12:19:30 2024 -0700 serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level commit 5208e7ced520a813b4f4774451fbac4e517e78b2 upstream. The FIFO is 64 bytes, but the FCR is configured to fire the TX interrupt when the FIFO is half empty (bit 3 = 0). Thus, we should only write 32 bytes when a TX interrupt occurs. This fixes a problem observed on the PXA168 that dropped a bunch of TX bytes during large transmissions. Fixes: ab28f51c77cd ("serial: rewrite pxa2xx-uart to use 8250_core") Signed-off-by: Doug Brown Link: https://lore.kernel.org/r/20240519191929.122202-1-doug@schmorgal.com Cc: stable Signed-off-by: Greg Kroah-Hartman commit 0d73477af964dbd7396163a13817baf13940bca9 Author: Miaohe Lin Date: Thu May 16 20:26:08 2024 +0800 mm/huge_memory: don't unpoison huge_zero_folio commit fe6f86f4b40855a130a19aa589f9ba7f650423f4 upstream. When I did memory failure tests recently, below panic occurs: kernel BUG at include/linux/mm.h:1135! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 Call Trace: do_shrink_slab+0x14f/0x6a0 shrink_slab+0xca/0x8c0 shrink_node+0x2d0/0x7d0 balance_pgdat+0x33a/0x720 kswapd+0x1f3/0x410 kthread+0xd5/0x100 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 The root cause is that HWPoison flag will be set for huge_zero_folio without increasing the folio refcnt. But then unpoison_memory() will decrease the folio refcnt unexpectedly as it appears like a successfully hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when releasing huge_zero_folio. Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. We're not prepared to unpoison huge_zero_folio yet. Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Miaohe Lin Acked-by: David Hildenbrand Reviewed-by: Yang Shi Reviewed-by: Oscar Salvador Reviewed-by: Anshuman Khandual Cc: Naoya Horiguchi Cc: Xu Yu Cc: Signed-off-by: Andrew Morton Signed-off-by: Miaohe Lin Signed-off-by: Greg Kroah-Hartman commit 93d61e1bac0a25f6808efba406488f1cc9a0f29a Author: Oleg Nesterov Date: Tue May 28 14:20:19 2024 +0200 tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() commit 07c54cc5988f19c9642fd463c2dbdac7fc52f777 upstream. After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash when the boot CPU is nohz_full") the kernel no longer crashes, but there is another problem. In this case tick_setup_device() calls tick_take_do_timer_from_boot() to update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled) in smp_call_function_single(). Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new comment explains why this is safe (thanks Thomas!). Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Oleg Nesterov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com Signed-off-by: Greg Kroah-Hartman commit 614d397be0cf43412b3f94a0f6460eddced8ce92 Author: Ryusuke Konishi Date: Thu May 30 23:15:56 2024 +0900 nilfs2: fix potential kernel bug due to lack of writeback flag waiting commit a4ca369ca221bb7e06c725792ac107f0e48e82e7 upstream. Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit f8474caf39bdab52fe64b75b286ebb013e0f68c0 Author: Petr Tesarik Date: Mon Jun 17 11:23:15 2024 -0300 swiotlb: extend buffer pre-padding to alloc_align_mask if necessary commit af133562d5aff41fcdbe51f1a504ae04788b5fc0 upstream. Allow a buffer pre-padding of up to alloc_align_mask, even if it requires allocating additional IO TLB slots. If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero bits in the original address between IO_TLB_SIZE and alloc_align_mask, these bits are not preserved in the swiotlb buffer address. To fix this case, increase the allocation size and use a larger offset within the allocated buffer. As a result, extra padding slots may be allocated before the mapping start address. Leave orig_addr in these padding slots initialized to INVALID_PHYS_ADDR. These slots do not correspond to any CPU buffer, so attempts to sync the data should be ignored. The padding slots should be automatically released when the buffer is unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the DMA buffer slot, not the first padding slot. Save the number of padding slots in struct io_tlb_slot and use it to adjust the slot index in swiotlb_release_slots(), so all allocated slots are properly freed. Cc: stable@vger.kernel.org # v6.6+ Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present") Link: https://lore.kernel.org/linux-iommu/20240311210507.217daf8b@meshulam.tesarici.cz/ Signed-off-by: Petr Tesarik Reviewed-by: Michael Kelley Tested-by: Michael Kelley Signed-off-by: Christoph Hellwig Signed-off-by: Fabio Estevam Signed-off-by: Greg Kroah-Hartman commit 6c385c1fa0a7fc767138a7bb39603966d1519c57 Author: Will Deacon Date: Mon Jun 17 11:23:14 2024 -0300 swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE commit 14cebf689a78e8a1c041138af221ef6eac6bc7da upstream. For swiotlb allocations >= PAGE_SIZE, the slab search historically adjusted the stride to avoid checking unaligned slots. This had the side-effect of aligning large mapping requests to PAGE_SIZE, but that was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks"). Since this alignment could be relied upon drivers, reinstate PAGE_SIZE alignment for swiotlb mappings >= PAGE_SIZE. Cc: stable@vger.kernel.org # v6.6+ Reported-by: Michael Kelley Signed-off-by: Will Deacon Reviewed-by: Robin Murphy Reviewed-by: Petr Tesarik Tested-by: Nicolin Chen Tested-by: Michael Kelley Signed-off-by: Christoph Hellwig Signed-off-by: Fabio Estevam Signed-off-by: Greg Kroah-Hartman commit 6033fc9522d284b090268d75ce4c68fea5df105e Author: Will Deacon Date: Mon Jun 17 11:23:13 2024 -0300 swiotlb: Enforce page alignment in swiotlb_alloc() commit 823353b7cf0ea9dfb09f5181d5fb2825d727200b upstream. When allocating pages from a restricted DMA pool in swiotlb_alloc(), the buffer address is blindly converted to a 'struct page *' that is returned to the caller. In the unlikely event of an allocation bug, page-unaligned addresses are not detected and slots can silently be double-allocated. Add a simple check of the buffer alignment in swiotlb_alloc() to make debugging a little easier if something has gone wonky. Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Will Deacon Reviewed-by: Michael Kelley Reviewed-by: Petr Tesarik Tested-by: Nicolin Chen Tested-by: Michael Kelley Signed-off-by: Christoph Hellwig Signed-off-by: Fabio Estevam Signed-off-by: Greg Kroah-Hartman commit 9f2050106f3761fe57bb5aed3bf661662f6e653b Author: Andrey Albershteyn Date: Mon Jun 17 16:03:55 2024 -0700 xfs: allow cross-linking special files without project quota commit e23d7e82b707d1d0a627e334fb46370e4f772c11 upstream. There's an issue that if special files is created before quota project is enabled, then it's not possible to link this file. This works fine for normal files. This happens because xfs_quota skips special files (no ioctls to set necessary flags). The check for having the same project ID for source and destination then fails as source file doesn't have any ID. mkfs.xfs -f /dev/sda mount -o prjquota /dev/sda /mnt/test mkdir /mnt/test/foo mkfifo /mnt/test/foo/fifo1 xfs_quota -xc "project -sp /mnt/test/foo 9" /mnt/test > Setting up project 9 (path /mnt/test/foo)... > xfs_quota: skipping special file /mnt/test/foo/fifo1 > Processed 1 (/etc/projects and cmdline) paths for project 9 with recursion depth infinite (-1). ln /mnt/test/foo/fifo1 /mnt/test/foo/fifo1_link > ln: failed to create hard link '/mnt/test/testdir/fifo1_link' => '/mnt/test/testdir/fifo1': Invalid cross-device link mkfifo /mnt/test/foo/fifo2 ln /mnt/test/foo/fifo2 /mnt/test/foo/fifo2_link Fix this by allowing linking of special files to the project quota if special files doesn't have any ID set (ID = 0). Signed-off-by: Andrey Albershteyn Reviewed-by: "Darrick J. Wong" Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 8bb0402836d0eb23a46b63115987b68907222f17 Author: Dave Chinner Date: Mon Jun 17 16:03:54 2024 -0700 xfs: don't use current->journal_info commit f2e812c1522dab847912309b00abcc762dd696da upstream. syzbot reported an ext4 panic during a page fault where found a journal handle when it didn't expect to find one. The structure it tripped over had a value of 'TRAN' in the first entry in the structure, and that indicates it tripped over a struct xfs_trans instead of a jbd2 handle. The reason for this is that the page fault was taken during a copy-out to a user buffer from an xfs bulkstat operation. XFS uses an "empty" transaction context for bulkstat to do automated metadata buffer cleanup, and so the transaction context is valid across the copyout of the bulkstat info into the user buffer. We are using empty transaction contexts like this in XFS to reduce the risk of failing to release objects we reference during the operation, especially during error handling. Hence we really need to ensure that we can take page faults from these contexts without leaving landmines for the code processing the page fault to trip over. However, this same behaviour could happen from any other filesystem that triggers a page fault or any other exception that is handled on-stack from within a task context that has current->journal_info set. Having a page fault from some other filesystem bounce into XFS where we have to run a transaction isn't a bug at all, but the usage of current->journal_info means that this could result corruption of the outer task's journal_info structure. The problem is purely that we now have two different contexts that now think they own current->journal_info. IOWs, no filesystem can allow page faults or on-stack exceptions while current->journal_info is set by the filesystem because the exception processing might use current->journal_info itself. If we end up with nested XFS transactions whilst holding an empty transaction, then it isn't an issue as the outer transaction does not hold a log reservation. If we ignore the current->journal_info usage, then the only problem that might occur is a deadlock if the exception tries to take the same locks the upper context holds. That, however, is not a problem that setting current->journal_info would solve, so it's largely an irrelevant concern here. IOWs, we really only use current->journal_info for a warning check in xfs_vm_writepages() to ensure we aren't doing writeback from a transaction context. Writeback might need to do allocation, so it can need to run transactions itself. Hence it's a debug check to warn us that we've done something silly, and largely it is not all that useful. So let's just remove all the use of current->journal_info in XFS and get rid of all the potential issues from nested contexts where current->journal_info might get misused by another filesystem context. Reported-by: syzbot+cdee56dbcdf0096ef605@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: "Darrick J. Wong" Reviewed-by: Mark Tinguely Reviewed-by: Christoph Hellwig Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 79ba47df4e98de33011aad98d5376cbe82cd3457 Author: Dave Chinner Date: Mon Jun 17 16:03:53 2024 -0700 xfs: allow sunit mount option to repair bad primary sb stripe values commit 15922f5dbf51dad334cde888ce6835d377678dc9 upstream. If a filesystem has a busted stripe alignment configuration on disk (e.g. because broken RAID firmware told mkfs that swidth was smaller than sunit), then the filesystem will refuse to mount due to the stripe validation failing. This failure is triggering during distro upgrades from old kernels lacking this check to newer kernels with this check, and currently the only way to fix it is with offline xfs_db surgery. This runtime validity checking occurs when we read the superblock for the first time and causes the mount to fail immediately. This prevents the rewrite of stripe unit/width via mount options that occurs later in the mount process. Hence there is no way to recover this situation without resorting to offline xfs_db rewrite of the values. However, we parse the mount options long before we read the superblock, and we know if the mount has been asked to re-write the stripe alignment configuration when we are reading the superblock and verifying it for the first time. Hence we can conditionally ignore stripe verification failures if the mount options specified will correct the issue. We validate that the new stripe unit/width are valid before we overwrite the superblock values, so we can ignore the invalid config at verification and fail the mount later if the new values are not valid. This, at least, gives users the chance of correcting the issue after a kernel upgrade without having to resort to xfs-db hacks. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit ae609281ecae5b1e0a64500aa37a2b9d4169719b Author: Long Li Date: Mon Jun 17 16:03:52 2024 -0700 xfs: ensure submit buffers on LSN boundaries in error handlers commit e4c3b72a6ea93ed9c1815c74312eee9305638852 upstream. While performing the IO fault injection test, I caught the following data corruption report: XFS (dm-0): Internal error ltbno + ltlen > bno at line 1957 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x79c/0x1130 CPU: 3 PID: 33 Comm: kworker/3:0 Not tainted 6.5.0-rc7-next-20230825-00001-g7f8666926889 #214 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 Workqueue: xfs-inodegc/dm-0 xfs_inodegc_worker Call Trace: dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 xfs_free_ag_extent+0x7d3/0x1130 __xfs_free_extent+0x201/0x3c0 xfs_trans_free_extent+0x29b/0xa10 xfs_extent_free_finish_item+0x2a/0xb0 xfs_defer_finish_noroll+0x8d1/0x1b40 xfs_defer_finish+0x21/0x200 xfs_itruncate_extents_flags+0x1cb/0x650 xfs_free_eofblocks+0x18f/0x250 xfs_inactive+0x485/0x570 xfs_inodegc_worker+0x207/0x530 process_scheduled_works+0x24a/0xe10 worker_thread+0x5ac/0xc60 kthread+0x2cd/0x3c0 ret_from_fork+0x4a/0x80 ret_from_fork_asm+0x11/0x20 XFS (dm-0): Corruption detected. Unmount and run xfs_repair After analyzing the disk image, it was found that the corruption was triggered by the fact that extent was recorded in both inode datafork and AGF btree blocks. After a long time of reproduction and analysis, we found that the reason of free sapce btree corruption was that the AGF btree was not recovered correctly. Consider the following situation, Checkpoint A and Checkpoint B are in the same record and share the same start LSN1, buf items of same object (AGF btree block) is included in both Checkpoint A and Checkpoint B. If the buf item in Checkpoint A has been recovered and updates metadata LSN permanently, then the buf item in Checkpoint B cannot be recovered, because log recovery skips items with a metadata LSN >= the current LSN of the recovery item. If there is still an inode item in Checkpoint B that records the Extent X, the Extent X will be recorded in both inode datafork and AGF btree block after Checkpoint B is recovered. Such transaction can be seen when allocing enxtent for inode bmap, it record both the addition of extent to the inode extent list and the removing extent from the AGF. |------------Record (LSN1)------------------|---Record (LSN2)---| |-------Checkpoint A----------|----------Checkpoint B-----------| | Buf Item(Extent X) | Buf Item / Inode item(Extent X) | | Extent X is freed | Extent X is allocated | After commit 12818d24db8a ("xfs: rework log recovery to submit buffers on LSN boundaries") was introduced, we submit buffers on lsn boundaries during log recovery. The above problem can be avoided under normal paths, but it's not guaranteed under abnormal paths. Consider the following process, if an error was encountered after recover buf item in Checkpoint A and before recover buf item in Checkpoint B, buffers that have been added to the buffer_list will still be submitted, this violates the submits rule on lsn boundaries. So buf item in Checkpoint B cannot be recovered on the next mount due to current lsn of transaction equal to metadata lsn on disk. The detailed process of the problem is as follows. First Mount: xlog_do_recovery_pass error = xlog_recover_process xlog_recover_process_data xlog_recover_process_ophdr xlog_recovery_process_trans ... /* recover buf item in Checkpoint A */ xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer /* add buffer of agf btree block to buffer_list */ xfs_buf_delwri_queue(bp, buffer_list) ... ==> Encounter read IO error and return /* submit buffers regardless of error */ if (!list_empty(&buffer_list)) xfs_buf_delwri_submit(&buffer_list); Second Mount: xlog_do_recovery_pass error = xlog_recover_process xlog_recover_process_data xlog_recover_process_ophdr xlog_recovery_process_trans ... /* recover buf item in Checkpoint B */ xlog_recover_buf_commit_pass2 /* buffer of agf btree block wouldn't added to buffer_list due to lsn equal to current_lsn */ if (XFS_LSN_CMP(lsn, current_lsn) >= 0) goto out_release In order to make sure that submits buffers on lsn boundaries in the abnormal paths, we need to check error status before submit buffers that have been added from the last record processed. If error status exist, buffers in the bufffer_list should not be writen to disk. Canceling the buffers in the buffer_list directly isn't correct, unlike any other place where write list was canceled, these buffers has been initialized by xfs_buf_item_init() during recovery and held by buf item, buf items will not be released in xfs_buf_delwri_cancel(), it's not easy to solve. If the filesystem has been shut down, then delwri list submission will error out all buffers on the list via IO submission/completion and do all the correct cleanup automatically. So shutting down the filesystem could prevents buffers in the bufffer_list from being written to disk. Fixes: 50d5c8d8e938 ("xfs: check LSN ordering for v5 superblocks during recovery") Signed-off-by: Long Li Reviewed-by: "Darrick J. Wong" Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 04fa4269089bcba9c31d0a6fa9ac64a830d8614d Author: Dave Chinner Date: Mon Jun 17 16:03:51 2024 -0700 xfs: shrink failure needs to hold AGI buffer commit 75bcffbb9e7563259b7aed0fa77459d6a3a35627 upstream. Chandan reported a AGI/AGF lock order hang on xfs/168 during recent testing. The cause of the problem was the task running xfs_growfs to shrink the filesystem. A failure occurred trying to remove the free space from the btrees that the shrink would make disappear, and that meant it ran the error handling for a partial failure. This error path involves restoring the per-ag block reservations, and that requires calculating the amount of space needed to be reserved for the free inode btree. The growfs operation hung here: [18679.536829] down+0x71/0xa0 [18679.537657] xfs_buf_lock+0xa4/0x290 [xfs] [18679.538731] xfs_buf_find_lock+0xf7/0x4d0 [xfs] [18679.539920] xfs_buf_lookup.constprop.0+0x289/0x500 [xfs] [18679.542628] xfs_buf_get_map+0x2b3/0xe40 [xfs] [18679.547076] xfs_buf_read_map+0xbb/0x900 [xfs] [18679.562616] xfs_trans_read_buf_map+0x449/0xb10 [xfs] [18679.569778] xfs_read_agi+0x1cd/0x500 [xfs] [18679.573126] xfs_ialloc_read_agi+0xc2/0x5b0 [xfs] [18679.578708] xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs] [18679.582480] xfs_ag_resv_init+0x2c5/0x490 [xfs] [18679.586023] xfs_ag_shrink_space+0x736/0xd30 [xfs] [18679.590730] xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs] [18679.599764] xfs_growfs_data+0x2f1/0x410 [xfs] [18679.602212] xfs_file_ioctl+0xd1e/0x1370 [xfs] trying to get the AGI lock. The AGI lock was held by a fstress task trying to do an inode allocation, and it was waiting on the AGF lock to allocate a new inode chunk on disk. Hence deadlock. The fix for this is for the growfs code to hold the AGI over the transaction roll it does in the error path. It already holds the AGF locked across this, and that is what causes the lock order inversion in the xfs_ag_resv_init() call. Reported-by: Chandan Babu R Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()") Signed-off-by: Dave Chinner Reviewed-by: Gao Xiang Reviewed-by: Christoph Hellwig Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit ea365e606231406a26ef755ca12fdfee7d418ed9 Author: Dave Chinner Date: Mon Jun 17 16:03:50 2024 -0700 xfs: fix SEEK_HOLE/DATA for regions with active COW extents commit 4b2f459d86252619448455013f581836c8b1b7da upstream. A data corruption problem was reported by CoreOS image builders when using reflink based disk image copies and then converting them to qcow2 images. The converted images failed the conversion verification step, and it was isolated down to the fact that qemu-img uses SEEK_HOLE/SEEK_DATA to find the data it is supposed to copy. The reproducer allowed me to isolate the issue down to a region of the file that had overlapping data and COW fork extents, and the problem was that the COW fork extent was being reported in it's entirity by xfs_seek_iomap_begin() and so skipping over the real data fork extents in that range. This was somewhat hidden by the fact that 'xfs_bmap -vvp' reported all the extents correctly, and reading the file completely (i.e. not using seek to skip holes) would map the file correctly and all the correct data extents are read. Hence the problem is isolated to just the xfs_seek_iomap_begin() implementation. Instrumentation with trace_printk made the problem obvious: we are passing the wrong length to xfs_trim_extent() in xfs_seek_iomap_begin(). We are passing the end_fsb, not the maximum length of the extent we want to trim the map too. Hence the COW extent map never gets trimmed to the start of the next data fork extent, and so the seek code treats the entire COW fork extent as unwritten and skips entirely over the data fork extents in that range. Link: https://github.com/coreos/coreos-assembler/issues/3728 Fixes: 60271ab79d40 ("xfs: fix SEEK_DATA for speculative COW fork preallocation") Signed-off-by: Dave Chinner Reviewed-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 7f0e5af2690aac3655004be51dd6c57ec53202db Author: Darrick J. Wong Date: Mon Jun 17 16:03:49 2024 -0700 xfs: fix scrub stats file permissions commit e610e856b938a1fc86e7ee83ad2f39716082bca7 upstream. When the kernel is in lockdown mode, debugfs will only show files that are world-readable and cannot be written, mmaped, or used with ioctl. That more or less describes the scrub stats file, except that the permissions are wrong -- they should be 0444, not 0644. You can't write the stats file, so the 0200 makes no sense. Meanwhile, the clear_stats file is only writable, but it got mode 0400 instead of 0200, which would make more sense. Fix both files so that they make sense. Fixes: d7a74cad8f451 ("xfs: track usage statistics of online fsck") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Signed-off-by: Chandan Babu R Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 20bccdb03a7ba4668785a8ab0bf8d79da53bf000 Author: Darrick J. Wong Date: Mon Jun 17 16:03:48 2024 -0700 xfs: fix imprecise logic in xchk_btree_check_block_owner commit c0afba9a8363f17d4efed22a8764df33389aebe8 upstream. A reviewer was confused by the init_sa logic in this function. Upon checking the logic, I discovered that the code is imprecise. What we want to do here is check that there is an ownership record in the rmap btree for the AG that contains a btree block. For an inode-rooted btree (e.g. the bmbt) the per-AG btree cursors have not been initialized because inode btrees can span multiple AGs. Therefore, we must initialize the per-AG btree cursors in sc->sa before proceeding. That is what init_sa controls, and hence the logic should be gated on XFS_BTREE_ROOT_IN_INODE, not XFS_BTREE_LONG_PTRS. In practice, ROOT_IN_INODE and LONG_PTRS are coincident so this hasn't mattered. However, we're about to refactor both of those flags into separate btree_ops fields so we want this the logic to make sense afterwards. Fixes: 858333dcf021a ("xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Catherine Hoang Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 092571ef9a812566c8f2c9038d9c2a64c49788d6 Author: Filipe Manana Date: Wed May 8 11:51:07 2024 +0100 btrfs: zoned: fix use-after-free due to race with dev replace commit 0090d6e1b210551e63cf43958dc7a1ec942cdde9 upstream. While loading a zone's info during creation of a block group, we can race with a device replace operation and then trigger a use-after-free on the device that was just replaced (source device of the replace operation). This happens because at btrfs_load_zone_info() we extract a device from the chunk map into a local variable and then use the device while not under the protection of the device replace rwsem. So if there's a device replace operation happening when we extract the device and that device is the source of the replace operation, we will trigger a use-after-free if before we finish using the device the replace operation finishes and frees the device. Fix this by enlarging the critical section under the protection of the device replace rwsem so that all uses of the device are done inside the critical section. CC: stable@vger.kernel.org # 6.1.x: 15c12fcc50a1: btrfs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x: 09a46725cc84: btrfs: zoned: factor out per-zone logic from btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x: 9e0e3e74dc69: btrfs: zoned: factor out single bg handling from btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x: 87463f7e0250: btrfs: zoned: factor out DUP bg handling from btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x Reviewed-by: Johannes Thumshirn Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 069e0cc343dad019527c648284272cf15e115558 Author: Christoph Hellwig Date: Mon Jun 5 10:51:08 2023 +0200 btrfs: zoned: factor out DUP bg handling from btrfs_load_block_group_zone_info commit 87463f7e0250d471fac41e7c9c45ae21d83b5f85 upstream. Split the code handling a type DUP block group from btrfs_load_block_group_zone_info to make the code more readable. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 68713bc70dab1d11cdd2d19a132c03ec2ba4c413 Author: Christoph Hellwig Date: Mon Jun 5 10:51:07 2023 +0200 btrfs: zoned: factor out single bg handling from btrfs_load_block_group_zone_info commit 9e0e3e74dc6928a0956f4e27e24d473c65887e96 upstream. Split the code handling a type single block group from btrfs_load_block_group_zone_info to make the code more readable. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit a139ad664240654b69d3e82e9554cf9056427650 Author: Christoph Hellwig Date: Mon Jun 5 10:51:06 2023 +0200 btrfs: zoned: factor out per-zone logic from btrfs_load_block_group_zone_info commit 09a46725cc84165af452d978a3532d6b97a28796 upstream. Split out a helper for the body of the per-zone loop in btrfs_load_block_group_zone_info to make the function easier to read and modify. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit f9526760879af01c006a200facbb00b595b8b2ab Author: Christoph Hellwig Date: Mon Jun 5 10:51:05 2023 +0200 btrfs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info commit 15c12fcc50a1b12a747f8b6ec05cdb18c537a4d1 upstream. Add a new zone_info structure to hold per-zone information in btrfs_load_block_group_zone_info and prepare for breaking out helpers from it. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit a3be677629e4e0d246284956ff422392fafd715f Author: Tomi Valkeinen Date: Mon Apr 15 19:00:23 2024 +0300 pmdomain: ti-sci: Fix duplicate PD referrals commit 670c900f69645db394efb38934b3344d8804171a upstream. When the dts file has multiple referrers to a single PD (e.g. simple-framebuffer and dss nodes both point to the DSS power-domain) the ti-sci driver will create two power domains, both with the same ID, and that will cause problems as one of the power domains will hide the other one. Fix this checking if a PD with the ID has already been created, and only create a PD for new IDs. Fixes: efa5c01cd7ee ("soc: ti: ti_sci_pm_domains: switch to use multiple genpds instead of one") Signed-off-by: Tomi Valkeinen Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240415-ti-sci-pd-v1-1-a0e56b8ad897@ideasonboard.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit 6fd062713d9995de9e68ac3a071213eee1c11ea9 Author: Alexander Shishkin Date: Mon Apr 29 16:01:19 2024 +0300 intel_th: pci: Add Lunar Lake support commit f866b65322bfbc8fcca13c25f49e1a5c5a93ae4d upstream. Add support for the Trace Hub in Lunar Lake. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-16-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit ebcef91164846ef225d444d77a3706db422d0954 Author: Alexander Shishkin Date: Mon Apr 29 16:01:17 2024 +0300 intel_th: pci: Add Meteor Lake-S support commit c4a30def564d75e84718b059d1a62cc79b137cf9 upstream. Add support for the Trace Hub in Meteor Lake-S. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-14-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit c8727ddde12c3b0df109e12034b4904de62ab1de Author: Alexander Shishkin Date: Mon Apr 29 16:01:16 2024 +0300 intel_th: pci: Add Sapphire Rapids SOC support commit 2e1da7efabe05cb0cf0b358883b2bc89080ed0eb upstream. Add support for the Trace Hub in Sapphire Rapids SOC. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-13-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 37eb9f7cc7197a32b3d289df2035075f524f1c65 Author: Alexander Shishkin Date: Mon Apr 29 16:01:15 2024 +0300 intel_th: pci: Add Granite Rapids SOC support commit 854afe461b009801a171b3a49c5f75ea43e4c04c upstream. Add support for the Trace Hub in Granite Rapids SOC. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-12-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 3b08df88b00d8c7a6844fb465fa276e2bd6d0475 Author: Alexander Shishkin Date: Mon Apr 29 16:01:14 2024 +0300 intel_th: pci: Add Granite Rapids support commit e44937889bdf4ecd1f0c25762b7226406b9b7a69 upstream. Add support for the Trace Hub in Granite Rapids. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-11-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit f287b1e34f1dd910723ca720300548c27a9a72d7 Author: Imre Deak Date: Tue May 21 17:30:22 2024 +0300 drm/i915: Fix audio component initialization commit 75800e2e4203ea83bbc9d4f63ad97ea582244a08 upstream. After registering the audio component in i915_audio_component_init() the audio driver may call i915_audio_component_get_power() via the component ops. This could program AUD_FREQ_CNTRL with an uninitialized value if the latter function is called before display.audio.freq_cntrl gets initialized. The get_power() function also does a modeset which in the above case happens too early before the initialization step and triggers the "Reject display access from task" error message added by the Fixes: commit below. Fix the above issue by registering the audio component only after the initialization step. Fixes: 87c1694533c9 ("drm/i915: save AUD_FREQ_CNTRL state at audio domain suspend") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10291 Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Imre Deak Reviewed-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/20240521143022.3784539-1-imre.deak@intel.com (cherry picked from commit fdd0b80172758ce284f19fa8a26d90c61e4371d2) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman commit 7a9883be3b98673333eec65c4a21cc18e60292eb Author: Vidya Srinivas Date: Mon May 20 22:26:34 2024 +0530 drm/i915/dpt: Make DPT object unshrinkable commit 43e2b37e2ab660c3565d4cff27922bc70e79c3f1 upstream. In some scenarios, the DPT object gets shrunk but the actual framebuffer did not and thus its still there on the DPT's vm->bound_list. Then it tries to rewrite the PTEs via a stale CPU mapping. This causes panic. Cc: stable@vger.kernel.org Reported-by: Shawn Lee Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt") Signed-off-by: Vidya Srinivas [vsyrjala: Add TODO comment] Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20240520165634.1162470-1-vidya.srinivas@intel.com (cherry picked from commit 51064d471c53dcc8eddd2333c3f1c1d9131ba36c) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman commit 1b4a8b89bf6787090b56424d269bf84ba00c3263 Author: Wachowski, Karol Date: Mon May 20 12:05:14 2024 +0200 drm/shmem-helper: Fix BUG_ON() on mmap(PROT_WRITE, MAP_PRIVATE) commit 39bc27bd688066a63e56f7f64ad34fae03fbe3b8 upstream. Lack of check for copy-on-write (COW) mapping in drm_gem_shmem_mmap allows users to call mmap with PROT_WRITE and MAP_PRIVATE flag causing a kernel panic due to BUG_ON in vmf_insert_pfn_prot: BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); Return -EINVAL early if COW mapping is detected. This bug affects all drm drivers using default shmem helpers. It can be reproduced by this simple example: void *ptr = mmap(0, size, PROT_WRITE, MAP_PRIVATE, fd, mmap_offset); ptr[0] = 0; Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects") Cc: Noralf Trønnes Cc: Eric Anholt Cc: Rob Herring Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: # v5.2+ Signed-off-by: Wachowski, Karol Signed-off-by: Jacek Lawrynowicz Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20240520100514.925681-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 1d2f1123a05e3e269cd7564005b0b717f2014437 Author: Chris Wilson Date: Tue Apr 23 18:23:10 2024 +0200 drm/i915/gt: Disarm breadcrumbs if engines are already idle commit 70cb9188ffc75e643debf292fcddff36c9dbd4ae upstream. The breadcrumbs use a GT wakeref for guarding the interrupt, but are disarmed during release of the engine wakeref. This leaves a hole where we may attach a breadcrumb just as the engine is parking (after it has parked its breadcrumbs), execute the irq worker with some signalers still attached, but never be woken again. That issue manifests itself in CI with IGT runner timeouts while tests are waiting indefinitely for release of all GT wakerefs. <6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats <7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5 <7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4 <7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3 <7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2 <7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off <7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6 <7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02 <4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT. ... <6> [299.356526] sysrq: Show State ... <6> [299.373964] task:i915_selftest state:D stack:11784 pid:5578 tgid:5578 ppid:873 flags:0x00004002 <6> [299.373967] Call Trace: <6> [299.373968] <6> [299.373970] __schedule+0x3bb/0xda0 <6> [299.373974] schedule+0x41/0x110 <6> [299.373976] intel_wakeref_wait_for_idle+0x82/0x100 [i915] <6> [299.374083] ? __pfx_var_wake_function+0x10/0x10 <6> [299.374087] live_engine_busy_stats+0x9b/0x500 [i915] <6> [299.374173] __i915_subtests+0xbe/0x240 [i915] <6> [299.374277] ? __pfx___intel_gt_live_setup+0x10/0x10 [i915] <6> [299.374369] ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915] <6> [299.374456] intel_engine_live_selftests+0x1c/0x30 [i915] <6> [299.374547] __run_selftests+0xbb/0x190 [i915] <6> [299.374635] i915_live_selftests+0x4b/0x90 [i915] <6> [299.374717] i915_pci_probe+0x10d/0x210 [i915] At the end of the interrupt worker, if there are no more engines awake, disarm the breadcrumb and go to sleep. Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission") Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026 Signed-off-by: Chris Wilson Cc: Andrzej Hajda Cc: # v5.12+ Signed-off-by: Janusz Krzysztofik Acked-by: Nirmoy Das Reviewed-by: Andrzej Hajda Reviewed-by: Andi Shyti Signed-off-by: Andi Shyti Link: https://patchwork.freedesktop.org/patch/msgid/20240423165505.465734-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit fbad43eccae5cb14594195c20113369aabaa22b5) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman commit 42524cc5feef81d51fce1b1c277bb12afb02a093 Author: Daniel Bristot de Oliveira Date: Wed Apr 24 16:36:51 2024 +0200 rtla/auto-analysis: Replace \t with spaces commit a40e5e4dd0207485dee75e2b8e860d5853bcc5f7 upstream. When copying timerlat auto-analysis from a terminal to some web pages or chats, the \t are being replaced with a single ' ' or ' ', breaking the output. For example: ## CPU 3 hit stop tracing, analyzing it ## IRQ handler delay: 1.30 us (0.11 %) IRQ latency: 1.90 us Timerlat IRQ duration: 3.00 us (0.24 %) Blocking thread: 1223.16 us (99.00 %) insync:4048 1223.16 us IRQ interference 4.93 us (0.40 %) local_timer:236 4.93 us ------------------------------------------------------------------------ Thread latency: 1235.47 us (100%) Replace \t with spaces to avoid this problem. Link: https://lkml.kernel.org/r/ec7ed2b2809c22ab0dfc8eb7c805ab9cddc4254a.1713968967.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Jonathan Corbet Cc: Juri Lelli Fixes: 27e348b221f6 ("rtla/timerlat: Add auto-analysis core") Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Greg Kroah-Hartman commit d32f12e157327c47967bb153c28508b1166db072 Author: Daniel Bristot de Oliveira Date: Wed Apr 24 16:36:50 2024 +0200 rtla/timerlat: Simplify "no value" printing on top commit 5f0769331a965675cdfec97c09f3f6e875d7c246 upstream. Instead of printing three times the same output, print it only once, reducing lines and being sure that all no values have the same length. It also fixes an extra '\n' when running the with kernel threads, like here: =============== %< ============== Timer Latency 0 00:00:01 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max 2 #0 | - - - - | 161 161 161 161 3 #0 | - - - - | 161 161 161 161 8 #1 | 54 54 54 54 | - - - -'\n' ---------------|----------------------------------------|--------------------------------------- ALL #1 e0 | 54 54 54 | 161 161 161 =============== %< ============== This '\n' should have been removed with the user-space support that added another '\n' if not running with kernel threads. Link: https://lkml.kernel.org/r/0a4d8085e7cd706733a5dc10a81ca38b82bd4992.1713968967.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Jonathan Corbet Cc: Juri Lelli Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support") Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Greg Kroah-Hartman commit 8661a7af04991201640863ad1a0983173f84b5eb Author: Nam Cao Date: Wed May 15 07:50:40 2024 +0200 riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context commit fb1cf0878328fe75d47f0aed0a65b30126fcefc4 upstream. __kernel_map_pages() is a debug function which clears the valid bit in page table entry for deallocated pages to detect illegal memory accesses to freed pages. This function set/clear the valid bit using __set_memory(). __set_memory() acquires init_mm's semaphore, and this operation may sleep. This is problematic, because __kernel_map_pages() can be called in atomic context, and thus is illegal to sleep. An example warning that this causes: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd preempt_count: 2, expected: 0 CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37 Hardware name: riscv-virtio,qemu (DT) Call Trace: [] dump_backtrace+0x1c/0x24 [] show_stack+0x2c/0x38 [] dump_stack_lvl+0x5a/0x72 [] dump_stack+0x14/0x1c [] __might_resched+0x104/0x10e [] __might_sleep+0x3e/0x62 [] down_write+0x20/0x72 [] __set_memory+0x82/0x2fa [] __kernel_map_pages+0x5a/0xd4 [] __alloc_pages_bulk+0x3b2/0x43a [] __vmalloc_node_range+0x196/0x6ba [] copy_process+0x72c/0x17ec [] kernel_clone+0x60/0x2fe [] kernel_thread+0x82/0xa0 [] kthreadd+0x14a/0x1be [] ret_from_fork+0xe/0x1c Rewrite this function with apply_to_existing_page_range(). It is fine to not have any locking, because __kernel_map_pages() works with pages being allocated/deallocated and those pages are not changed by anyone else in the meantime. Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support") Signed-off-by: Nam Cao Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de Signed-off-by: Palmer Dabbelt Signed-off-by: Greg Kroah-Hartman commit 6ee0c842d4ad24823cebda6fc26261952acf7482 Author: Jean-Baptiste Maneyrol Date: Fri Apr 26 13:58:14 2024 +0000 iio: invensense: fix interrupt timestamp alignment commit 0340dc4c82590d8735c58cf904a8aa1173273ab5 upstream. Restrict interrupt timestamp alignment for not overflowing max/min period thresholds. Fixes: 0ecc363ccea7 ("iio: make invensense timestamp module generic") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol Link: https://lore.kernel.org/r/20240426135814.141837-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 955c824d543cb8b028b3efb1eaeec7bcdc7b6d44 Author: Nuno Sa Date: Fri Apr 26 17:42:13 2024 +0200 iio: adc: axi-adc: make sure AXI clock is enabled commit 80721776c5af6f6dce7d84ba8df063957aa425a2 upstream. We can only access the IP core registers if the bus clock is enabled. As such we need to get and enable it and not rely on anyone else to do it. Note this clock is a very fundamental one that is typically enabled pretty early during boot. Independently of that, we should really rely on it to be enabled. Fixes: ef04070692a2 ("iio: adc: adi-axi-adc: add support for AXI ADC IP core") Signed-off-by: Nuno Sa Link: https://lore.kernel.org/r/20240426-ad9467-new-features-v2-4-6361fc3ba1cc@analog.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 33187fa1a8bbcfc39b34c369ea3d1dcab07ae557 Author: Beleswar Padhi Date: Tue Apr 30 16:23:07 2024 +0530 remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs commit 3c8a9066d584f5010b6f4ba03bf6b19d28973d52 upstream. PSC controller has a limitation that it can only power-up the second core when the first core is in ON state. Power-state for core0 should be equal to or higher than core1. Therefore, prevent core1 from powering up before core0 during the start process from sysfs. Similarly, prevent core0 from shutting down before core1 has been shut down from sysfs. Fixes: 6dedbd1d5443 ("remoteproc: k3-r5: Add a remoteproc driver for R5F subsystem") Signed-off-by: Beleswar Padhi Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240430105307.1190615-3-b-padhi@ti.com Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman commit 2494bc856e7ce50b1c4fd8afb4d17f2693f36565 Author: Apurva Nandan Date: Tue Apr 30 16:23:06 2024 +0530 remoteproc: k3-r5: Wait for core0 power-up before powering up core1 commit 61f6f68447aba08aeaa97593af3a7d85a114891f upstream. PSC controller has a limitation that it can only power-up the second core when the first core is in ON state. Power-state for core0 should be equal to or higher than core1, else the kernel is seen hanging during rproc loading. Make the powering up of cores sequential, by waiting for the current core to power-up before proceeding to the next core, with a timeout of 2sec. Add a wait queue event in k3_r5_cluster_rproc_init call, that will wait for the current core to be released from reset before proceeding with the next core. Fixes: 6dedbd1d5443 ("remoteproc: k3-r5: Add a remoteproc driver for R5F subsystem") Signed-off-by: Apurva Nandan Signed-off-by: Beleswar Padhi Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240430105307.1190615-2-b-padhi@ti.com Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman commit aa81c7b078e011078001602138dec573f06368b0 Author: Nuno Sa Date: Thu Mar 28 14:58:50 2024 +0100 dmaengine: axi-dmac: fix possible race in remove() commit 1bc31444209c8efae98cb78818131950d9a6f4d6 upstream. We need to first free the IRQ before calling of_dma_controller_free(). Otherwise we could get an interrupt and schedule a tasklet while removing the DMA controller. Fixes: 0e3b67b348b8 ("dmaengine: Add support for the Analog Devices AXI-DMAC DMA controller") Cc: stable@kernel.org Signed-off-by: Nuno Sa Link: https://lore.kernel.org/r/20240328-axi-dmac-devm-probe-v3-1-523c0176df70@analog.com Signed-off-by: Vinod Koul Signed-off-by: Greg Kroah-Hartman commit 4145835ec2096435033046a9bfdc70b6243eaf64 Author: Rick Wertenbroek Date: Wed Apr 3 16:45:08 2024 +0200 PCI: rockchip-ep: Remove wrong mask on subsys_vendor_id commit 2dba285caba53f309d6060fca911b43d63f41697 upstream. Remove wrong mask on subsys_vendor_id. Both the Vendor ID and Subsystem Vendor ID are u16 variables and are written to a u32 register of the controller. The Subsystem Vendor ID was always 0 because the u16 value was masked incorrectly with GENMASK(31,16) resulting in all lower 16 bits being set to 0 prior to the shift. Remove both masks as they are unnecessary and set the register correctly i.e., the lower 16-bits are the Vendor ID and the upper 16-bits are the Subsystem Vendor ID. This is documented in the RK3399 TRM section 17.6.7.1.17 [kwilczynski: removed unnecesary newline] Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Link: https://lore.kernel.org/linux-pci/20240403144508.489835-1-rick.wertenbroek@gmail.com Signed-off-by: Rick Wertenbroek Signed-off-by: Krzysztof Wilczyński Signed-off-by: Bjorn Helgaas Reviewed-by: Damien Le Moal Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 3c361f313d696df72f9bccf058510e9ec737b9b1 Author: Su Yue Date: Mon Apr 8 16:20:39 2024 +0800 ocfs2: fix races between hole punching and AIO+DIO commit 952b023f06a24b2ad6ba67304c4c84d45bea2f18 upstream. After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block", fstests/generic/300 become from always failed to sometimes failed: ======================================================================== [ 473.293420 ] run fstests generic/300 [ 475.296983 ] JBD2: Ignoring recovery information on journal [ 475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0) with ordered data mode. [ 494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag: Owner 5668 has an extent at cpos 78723 which can no longer be found [ 494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted. [ 494.292018 ] OCFS2: File system is now read-only. [ 494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272 ERROR: status = -30 [ 494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374 ERROR: status = -3 fio: io_u error on file /mnt/scratch/racer: Read-only file system: write offset=460849152, buflen=131072 ========================================================================= In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add unwritten extents to a list. extents are also inserted into extent tree in ocfs2_write_begin_nolock. Then another thread call fallocate to puch a hole at one of the unwritten extent. The extent at cpos was removed by ocfs2_remove_extent(). At end io worker thread, ocfs2_search_extent_list found there is no such extent at the cpos. T1 T2 T3 inode lock ... insert extents ... inode unlock ocfs2_fallocate __ocfs2_change_file_space inode lock lock ip_alloc_sem ocfs2_remove_inode_range inode ocfs2_remove_btree_range ocfs2_remove_extent ^---remove the extent at cpos 78723 ... unlock ip_alloc_sem inode unlock ocfs2_dio_end_io ocfs2_dio_end_io_write lock ip_alloc_sem ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_search_extent_list ^---failed to find extent ... unlock ip_alloc_sem In most filesystems, fallocate is not compatible with racing with AIO+DIO, so fix it by adding to wait for all dio before fallocate/punch_hole like ext4. Link: https://lkml.kernel.org/r/20240408082041.20925-3-glass.su@suse.com Fixes: b25801038da5 ("ocfs2: Support xfs style space reservation ioctls") Signed-off-by: Su Yue Reviewed-by: Joseph Qi Cc: Changwei Ge Cc: Gang He Cc: Joel Becker Cc: Jun Piao Cc: Junxiao Bi Cc: Mark Fasheh Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 7ec0e3b86f5ab0dd2c278b7bd7644e5648d2a2c2 Author: Su Yue Date: Mon Apr 8 16:20:41 2024 +0800 ocfs2: use coarse time for new created files commit b8cb324277ee16f3eca3055b96fce4735a5a41c6 upstream. The default atime related mount option is '-o realtime' which means file atime should be updated if atime <= ctime or atime <= mtime. atime should be updated in the following scenario, but it is not: ========================================================== $ rm /mnt/testfile; $ echo test > /mnt/testfile $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 $ sleep 5 $ cat /mnt/testfile > /dev/null $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 ========================================================== And the reason the atime in the test is not updated is that ocfs2 calls ktime_get_real_ts64() in __ocfs2_mknod_locked during file creation. Then inode_set_ctime_current() is called in inode_set_ctime_current() calls ktime_get_coarse_real_ts64() to get current time. ktime_get_real_ts64() is more accurate than ktime_get_coarse_real_ts64(). In my test box, I saw ctime set by ktime_get_coarse_real_ts64() is less than ktime_get_real_ts64() even ctime is set later. The ctime of the new inode is smaller than atime. The call trace is like: ocfs2_create ocfs2_mknod __ocfs2_mknod_locked .... ktime_get_real_ts64 <------- set atime,ctime,mtime, more accurate ocfs2_populate_inode ... ocfs2_init_acl ocfs2_acl_set_mode inode_set_ctime_current current_time ktime_get_coarse_real_ts64 <-------less accurate ocfs2_file_read_iter ocfs2_inode_lock_atime ocfs2_should_update_atime atime <= ctime ? <-------- false, ctime < atime due to accuracy So here call ktime_get_coarse_real_ts64 to set inode time coarser while creating new files. It may lower the accuracy of file times. But it's not a big deal since we already use coarse time in other places like ocfs2_update_inode_atime and inode_set_ctime_current. Link: https://lkml.kernel.org/r/20240408082041.20925-5-glass.su@suse.com Fixes: c62c38f6b91b ("ocfs2: replace CURRENT_TIME macro") Signed-off-by: Su Yue Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 518fbd644dabb6aedbdd4939c6c9cc1bf651459f Author: Rik van Riel Date: Tue May 7 09:18:58 2024 -0400 fs/proc: fix softlockup in __read_vmcore commit 5cbcb62dddf5346077feb82b7b0c9254222d3445 upstream. While taking a kernel core dump with makedumpfile on a larger system, softlockup messages often appear. While softlockup warnings can be harmless, they can also interfere with things like RCU freeing memory, which can be problematic when the kdump kexec image is configured with as little memory as possible. Avoid the softlockup, and give things like work items and RCU a chance to do their thing during __read_vmcore by adding a cond_resched. Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com Signed-off-by: Rik van Riel Acked-by: Baoquan He Cc: Dave Young Cc: Vivek Goyal Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit ba04b459efd11c057cd7dbc9dc5da9af0d4f4b31 Author: Trond Myklebust Date: Mon May 6 12:30:04 2024 -0400 knfsd: LOOKUP can return an illegal error value commit e221c45da3770962418fb30c27d941bbc70d595a upstream. The 'NFS error' NFSERR_OPNOTSUPP is not described by any of the official NFS related RFCs, but appears to have snuck into some older .x files for NFSv2. Either way, it is not in RFC1094, RFC1813 or any of the NFSv4 RFCs, so should not be returned by the knfsd server, and particularly not by the "LOOKUP" operation. Instead, let's return NFSERR_STALE, which is more appropriate if the filesystem encodes the filehandle as FILEID_INVALID. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 591d5b12f8147cc88cc9c2b21740c6166d36c817 Author: Vamshi Gajjela Date: Tue May 7 14:07:41 2024 -0700 spmi: hisi-spmi-controller: Do not override device identifier commit eda4923d78d634482227c0b189d9b7ca18824146 upstream. 'nr' member of struct spmi_controller, which serves as an identifier for the controller/bus. This value is a dynamic ID assigned in spmi_controller_alloc, and overriding it from the driver results in an ida_free error "ida_free called for id=xx which is not allocated". Signed-off-by: Vamshi Gajjela Fixes: 70f59c90c819 ("staging: spmi: add Hikey 970 SPMI controller driver") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240228185116.1269-1-vamshigajjela@google.com Signed-off-by: Stephen Boyd Link: https://lore.kernel.org/r/20240507210809.3479953-5-sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman commit e293c6b38ac9029d76ff0d2a6b2d74131709a9a8 Author: Hagar Gamal Halim Hemdan Date: Tue Apr 30 08:59:16 2024 +0000 vmci: prevent speculation leaks by sanitizing event in event_deliver() commit 8003f00d895310d409b2bf9ef907c56b42a4e0f4 upstream. Coverity spotted that event_msg is controlled by user-space, event_msg->event_data.event is passed to event_deliver() and used as an index without sanitization. This change ensures that the event index is sanitized to mitigate any possibility of speculative information leaks. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Only compile tested, no access to HW. Fixes: 1d990201f9bb ("VMCI: event handling implementation.") Cc: stable Signed-off-by: Hagar Gamal Halim Hemdan Link: https://lore.kernel.org/stable/20231127193533.46174-1-hagarhem%40amazon.com Link: https://lore.kernel.org/r/20240430085916.4753-1-hagarhem@amazon.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman commit 2d11505e79c7c4191e9f117da3a058d7f24c4189 Author: Fedor Pchelkin Date: Wed May 22 21:13:08 2024 +0300 dma-buf: handle testing kthreads creation failure commit 6cb05d89fd62a76a9b74bd16211fb0930e89fea8 upstream. kthread creation may possibly fail inside race_signal_callback(). In such a case stop the already started threads, put the already taken references to them and return with error code. Found by Linux Verification Center (linuxtesting.org). Fixes: 2989f6451084 ("dma-buf: Add selftests for dma-fence") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin Reviewed-by: T.J. Mercier Link: https://patchwork.freedesktop.org/patch/msgid/20240522181308.841686-1-pchelkin@ispras.ru Signed-off-by: Christian König Signed-off-by: Greg Kroah-Hartman commit e946428439a0d2079959f5603256ac51b6047017 Author: Thadeu Lima de Souza Cascardo Date: Fri May 24 11:47:02 2024 -0300 sock_map: avoid race between sock_map_close and sk_psock_put commit 4b4647add7d3c8530493f7247d11e257ee425bf0 upstream. sk_psock_get will return NULL if the refcount of psock has gone to 0, which will happen when the last call of sk_psock_put is done. However, sk_psock_drop may not have finished yet, so the close callback will still point to sock_map_close despite psock being NULL. This can be reproduced with a thread deleting an element from the sock map, while the second one creates a socket, adds it to the map and closes it. That will trigger the WARN_ON_ONCE: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701 Modules linked in: CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701 Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02 RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293 RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000 RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0 RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3 R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840 R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870 FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0 Call Trace: unix_release+0x87/0xc0 net/unix/af_unix.c:1048 __sock_release net/socket.c:659 [inline] sock_close+0xbe/0x240 net/socket.c:1421 __fput+0x42b/0x8a0 fs/file_table.c:422 __do_sys_close fs/open.c:1556 [inline] __se_sys_close fs/open.c:1541 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1541 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb37d618070 Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070 RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Use sk_psock, which will only check that the pointer is not been set to NULL yet, which should only happen after the callbacks are restored. If, then, a reference can still be gotten, we may call sk_psock_stop and cancel psock->work. As suggested by Paolo Abeni, reorder the condition so the control flow is less convoluted. After that change, the reproducer does not trigger the WARN_ON_ONCE anymore. Suggested-by: Paolo Abeni Reported-by: syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=07a2e4a1a57118ef7355 Fixes: aadb2bb83ff7 ("sock_map: Fix a potential use-after-free in sock_map_close()") Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself") Cc: stable@vger.kernel.org Signed-off-by: Thadeu Lima de Souza Cascardo Acked-by: Jakub Sitnicki Link: https://lore.kernel.org/r/20240524144702.1178377-1-cascardo@igalia.com Signed-off-by: Paolo Abeni Signed-off-by: Greg Kroah-Hartman commit 2c581ca0d68fbe4ba0072a26aa5b32ff2fad1dae Author: Damien Le Moal Date: Tue May 28 15:28:52 2024 +0900 null_blk: Print correct max open zones limit in null_init_zoned_dev() commit 233e27b4d21c3e44eb863f03e566d3a22e81a7ae upstream. When changing the maximum number of open zones, print that number instead of the total number of zones. Fixes: dc4d137ee3b7 ("null_blk: add support for max open/active zone limit for zoned devices") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Reviewed-by: Niklas Cassel Link: https://lore.kernel.org/r/20240528062852.437599-1-dlemoal@kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 5fc6b708ef20002f017ae00719482954b5289ee0 Author: Matthias Maennich Date: Tue May 28 11:32:43 2024 +0000 kheaders: explicitly define file modes for archived headers commit 3bd27a847a3a4827a948387cc8f0dbc9fa5931d5 upstream. Build environments might be running with different umask settings resulting in indeterministic file modes for the files contained in kheaders.tar.xz. The file itself is served with 444, i.e. world readable. Archive the files explicitly with 744,a+X to improve reproducibility across build environments. --mode=0444 is not suitable as directories need to be executable. Also, 444 makes it hard to delete all the readonly files after extraction. Cc: stable@vger.kernel.org Signed-off-by: Matthias Maennich Signed-off-by: Masahiro Yamada Signed-off-by: Greg Kroah-Hartman commit fcb88dc66b72f6b0617a1d3e964fbcfbfe802b13 Author: Steven Rostedt (Google) Date: Mon May 20 20:57:37 2024 -0400 tracing/selftests: Fix kprobe event name test for .isra. functions commit 23a4b108accc29a6125ed14de4a044689ffeda78 upstream. The kprobe_eventname.tc test checks if a function with .isra. can have a kprobe attached to it. It loops through the kallsyms file for all the functions that have the .isra. name, and checks if it exists in the available_filter_functions file, and if it does, it uses it to attach a kprobe to it. The issue is that kprobes can not attach to functions that are listed more than once in available_filter_functions. With the latest kernel, the function that is found is: rapl_event_update.isra.0 # grep rapl_event_update.isra.0 /sys/kernel/tracing/available_filter_functions rapl_event_update.isra.0 rapl_event_update.isra.0 It is listed twice. This causes the attached kprobe to it to fail which in turn fails the test. Instead of just picking the function function that is found in available_filter_functions, pick the first one that is listed only once in available_filter_functions. Cc: stable@vger.kernel.org Fixes: 604e3548236d ("selftests/ftrace: Select an existing function in kprobe_eventname test") Signed-off-by: Steven Rostedt (Google) Acked-by: Masami Hiramatsu (Google) Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit d63e501ac6da1faa1f865c9b6412cb56402283c1 Author: Nam Cao Date: Thu Apr 25 13:52:01 2024 +0200 riscv: fix overlap of allocated page and PTR_ERR commit 994af1825a2aa286f4903ff64a1c7378b52defe6 upstream. On riscv32, it is possible for the last page in virtual address space (0xfffff000) to be allocated. This page overlaps with PTR_ERR, so that shouldn't happen. There is already some code to ensure memblock won't allocate the last page. However, buddy allocator is left unchecked. Fix this by reserving physical memory that would be mapped at virtual addresses greater than 0xfffff000. Reported-by: Björn Töpel Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao Cc: Tested-by: Björn Töpel Reviewed-by: Björn Töpel Reviewed-by: Mike Rapoport (IBM) Link: https://lore.kernel.org/r/20240425115201.3044202-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt Signed-off-by: Greg Kroah-Hartman commit 7063f15d2ae214fe00fae30b06e8bb47c56e8506 Author: Adrian Hunter Date: Fri Mar 15 09:13:34 2024 +0200 perf auxtrace: Fix multiple use of --itrace option commit bb69c912c4e8005cf1ee6c63782d2fc28838dee2 upstream. If the --itrace option is used more than once, the options are combined, but "i" and "y" (sub-)options can be corrupted because itrace_do_parse_synth_opts() incorrectly overwrites the period type and period with default values. For example, with: --itrace=i0ns --itrace=e The processing of "--itrace=e", resets the "i" period from 0 nanoseconds to the default 100 microseconds. Fix by performing the default setting of period type and period only if "i" or "y" are present in the currently processed --itrace value. Fixes: f6986c95af84ff2a ("perf session: Add instruction tracing options") Signed-off-by: Adrian Hunter Cc: Adrian Hunter Cc: Andi Kleen Cc: Ian Rogers Cc: Jiri Olsa Cc: Namhyung Kim Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 809a2ed1717918a868248e8683ae751fb3c5fc4a Author: Haifeng Xu Date: Mon May 13 10:39:48 2024 +0000 perf/core: Fix missing wakeup when waiting for context reference commit 74751ef5c1912ebd3e65c3b65f45587e05ce5d36 upstream. In our production environment, we found many hung tasks which are blocked for more than 18 hours. Their call traces are like this: [346278.191038] __schedule+0x2d8/0x890 [346278.191046] schedule+0x4e/0xb0 [346278.191049] perf_event_free_task+0x220/0x270 [346278.191056] ? init_wait_var_entry+0x50/0x50 [346278.191060] copy_process+0x663/0x18d0 [346278.191068] kernel_clone+0x9d/0x3d0 [346278.191072] __do_sys_clone+0x5d/0x80 [346278.191076] __x64_sys_clone+0x25/0x30 [346278.191079] do_syscall_64+0x5c/0xc0 [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50 [346278.191086] ? do_syscall_64+0x69/0xc0 [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20 [346278.191092] ? irqentry_exit+0x19/0x30 [346278.191095] ? exc_page_fault+0x89/0x160 [346278.191097] ? asm_exc_page_fault+0x8/0x30 [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae The task was waiting for the refcount become to 1, but from the vmcore, we found the refcount has already been 1. It seems that the task didn't get woken up by perf_event_release_kernel() and got stuck forever. The below scenario may cause the problem. Thread A Thread B ... ... perf_event_free_task perf_event_release_kernel ... acquire event->child_mutex ... get_ctx ... release event->child_mutex acquire ctx->mutex ... perf_free_event (acquire/release event->child_mutex) ... release ctx->mutex wait_var_event acquire ctx->mutex acquire event->child_mutex # move existing events to free_list release event->child_mutex release ctx->mutex put_ctx ... ... In this case, all events of the ctx have been freed, so we couldn't find the ctx in free_list and Thread A will miss the wakeup. It's thus necessary to add a wakeup after dropping the reference. Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()") Signed-off-by: Haifeng Xu Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Frederic Weisbecker Acked-by: Mark Rutland Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com Signed-off-by: Greg Kroah-Hartman commit 348008f0043cd5ba915cfde44027c59bdb8a6791 Author: Yazen Ghannam Date: Mon Apr 3 16:42:44 2023 +0000 x86/amd_nb: Check for invalid SMN reads commit c625dabbf1c4a8e77e4734014f2fde7aa9071a1f upstream. AMD Zen-based systems use a System Management Network (SMN) that provides access to implementation-specific registers. SMN accesses are done indirectly through an index/data pair in PCI config space. The PCI config access may fail and return an error code. This would prevent the "read" value from being updated. However, the PCI config access may succeed, but the return value may be invalid. This is in similar fashion to PCI bad reads, i.e. return all bits set. Most systems will return 0 for SMN addresses that are not accessible. This is in line with AMD convention that unavailable registers are Read-as-Zero/Writes-Ignored. However, some systems will return a "PCI Error Response" instead. This value, along with an error code of 0 from the PCI config access, will confuse callers of the amd_smn_read() function. Check for this condition, clear the return value, and set a proper error code. Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h") Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov (AMD) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman commit d91ddd05082691e69b30744825d18ae799293258 Author: David Kaplan Date: Sun Jun 2 13:19:09 2024 -0500 x86/kexec: Fix bug with call depth tracking commit 93c1800b3799f17375989b0daf76497dd3e80922 upstream. The call to cc_platform_has() triggers a fault and system crash if call depth tracking is active because the GS segment has been reset by load_segments() and GS_BASE is now 0 but call depth tracking uses per-CPU variables to operate. Call cc_platform_has() earlier in the function when GS is still valid. [ bp: Massage. ] Fixes: 5d8213864ade ("x86/retbleed: Add SKL return thunk") Signed-off-by: David Kaplan Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Tom Lendacky Cc: Link: https://lore.kernel.org/r/20240603083036.637-1-bp@kernel.org Signed-off-by: Greg Kroah-Hartman commit 5c0fb9cb404a2efbbc319ff9d1b877cf4e47e950 Author: Hagar Hemdan Date: Fri May 31 16:21:44 2024 +0000 irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update() commit b97e8a2f7130a4b30d1502003095833d16c028b3 upstream. its_vlpi_prop_update() calls lpi_write_config() which obtains the mapping information for a VLPI without lock held. So it could race with its_vlpi_unmap(). Since all calls from its_irq_set_vcpu_affinity() require the same lock to be held, hoist the locking there instead of sprinkling the locking all over the place. This bug was discovered using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. [ tglx: Use guard() instead of goto ] Fixes: 015ec0386ab6 ("irqchip/gic-v3-its: Add VLPI configuration handling") Suggested-by: Marc Zyngier Signed-off-by: Hagar Hemdan Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier Link: https://lore.kernel.org/r/20240531162144.28650-1-hagarhem@amazon.com Signed-off-by: Greg Kroah-Hartman commit 6d0881a00d4cc20be3dd026f0a2ee11eecf8d54c Author: Michael J. Ruhl Date: Fri Feb 23 15:25:56 2024 -0500 clkdev: Update clkdev id usage to allow for longer names commit 99f4570cfba1e60daafde737cb7e395006d719e6 upstream. clkdev DEV ID information is limited to an array of 20 bytes (MAX_DEV_ID). It is possible that the ID could be longer than that. If so, the lookup will fail because the "real ID" will not match the copied value. For instance, generating a device name for the I2C Designware module using the PCI ID can result in a name of: i2c_designware.39424 clkdev_create() will store: i2c_designware.3942 The stored name is one off and will not match correctly during probe. Increase the size of the ID to allow for a longer name. Reviewed-by: Russell King (Oracle) Signed-off-by: Michael J. Ruhl Link: https://lore.kernel.org/r/20240223202556.2194021-1-michael.j.ruhl@intel.com Reviewed-by: Andy Shevchenko Signed-off-by: Stephen Boyd Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit dbf0787c2f4561ee60a7ac6a934cb36ea5e5784b Author: YonglongLi Date: Fri Jun 7 17:01:50 2024 +0200 mptcp: pm: update add_addr counters after connect commit 40eec1795cc27b076d49236649a29507c7ed8c2d upstream. The creation of new subflows can fail for different reasons. If no subflow have been created using the received ADD_ADDR, the related counters should not be updated, otherwise they will never be decremented for events related to this ID later on. For the moment, the number of accepted ADD_ADDR is only decremented upon the reception of a related RM_ADDR, and only if the remote address ID is currently being used by at least one subflow. In other words, if no subflow can be created with the received address, the counter will not be decremented. In this case, it is then important not to increment pm.add_addr_accepted counter, and not to modify pm.accept_addr bit. Note that this patch does not modify the behaviour in case of failures later on, e.g. if the MP Join is dropped or rejected. The "remove invalid addresses" MP Join subtest has been modified to validate this case. The broadcast IP address is added before the "valid" address that will be used to successfully create a subflow, and the limit is decreased by one: without this patch, it was not possible to create the last subflow, because: - the broadcast address would have been accepted even if it was not usable: the creation of a subflow to this address results in an error, - the limit of 2 accepted ADD_ADDR would have then been reached. Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: YonglongLi Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 09469a081715fdbcd737f0f6bf4c3d03c979f2a0 Author: YonglongLi Date: Fri Jun 7 17:01:49 2024 +0200 mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID commit 6a09788c1a66e3d8b04b3b3e7618cc817bb60ae9 upstream. The RmAddr MIB counter is supposed to be incremented once when a valid RM_ADDR has been received. Before this patch, it could have been incremented as many times as the number of subflows connected to the linked address ID, so it could have been 0, 1 or more than 1. The "RmSubflow" is incremented after a local operation. In this case, it is normal to tied it with the number of subflows that have been actually removed. The "remove invalid addresses" MP Join subtest has been modified to validate this case. A broadcast IP address is now used instead: the client will not be able to create a subflow to this address. The consequence is that when receiving the RM_ADDR with the ID attached to this broadcast IP address, no subflow linked to this ID will be found. Fixes: 7a7e52e38a40 ("mptcp: add RM_ADDR related mibs") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: YonglongLi Signed-off-by: Matthieu Baerts (NGI0) Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit f1f0a46f8bb8890b90ab7194f0a0c8fe2a3fb57f Author: Paolo Abeni Date: Fri Jun 7 17:01:48 2024 +0200 mptcp: ensure snd_una is properly initialized on connect commit 8031b58c3a9b1db3ef68b3bd749fbee2e1e1aaa3 upstream. This is strictly related to commit fb7a0d334894 ("mptcp: ensure snd_nxt is properly initialized on connect"). It turns out that syzkaller can trigger the retransmit after fallback and before processing any other incoming packet - so that snd_una is still left uninitialized. Address the issue explicitly initializing snd_una together with snd_nxt and write_seq. Suggested-by: Mat Martineau Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 35bcf16b4a28c10923ff391d14f6ed0ae471ee5f Author: Marek Szyprowski Date: Thu Apr 25 11:48:51 2024 +0200 drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found commit 799d4b392417ed6889030a5b2335ccb6dcf030ab upstream. When reading EDID fails and driver reports no modes available, the DRM core adds an artificial 1024x786 mode to the connector. Unfortunately some variants of the Exynos HDMI (like the one in Exynos4 SoCs) are not able to drive such mode, so report a safe 640x480 mode instead of nothing in case of the EDID reading failure. This fixes the following issue observed on Trats2 board since commit 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()"): [drm] Exynos DRM: using 11c00000.fimd device for DMA mapping operations exynos-drm exynos-drm: bound 11c00000.fimd (ops fimd_component_ops) exynos-drm exynos-drm: bound 12c10000.mixer (ops mixer_component_ops) exynos-dsi 11c80000.dsi: [drm:samsung_dsim_host_attach] Attached s6e8aa0 device (lanes:4 bpp:24 mode-flags:0x10b) exynos-drm exynos-drm: bound 11c80000.dsi (ops exynos_dsi_component_ops) exynos-drm exynos-drm: bound 12d00000.hdmi (ops hdmi_component_ops) [drm] Initialized exynos 1.1.0 20180330 for exynos-drm on minor 1 exynos-hdmi 12d00000.hdmi: [drm:hdmiphy_enable.part.0] *ERROR* PLL could not reach steady state panel-samsung-s6e8aa0 11c80000.dsi.0: ID: 0xa2, 0x20, 0x8c exynos-mixer 12c10000.mixer: timeout waiting for VSYNC ------------[ cut here ]------------ WARNING: CPU: 1 PID: 11 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8 [CRTC:70:crtc-1] vblank wait timed out Modules linked in: CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.9.0-rc5-next-20240424 #14913 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_unbound deferred_probe_work_func Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from __warn+0x7c/0x1c4 __warn from warn_slowpath_fmt+0x11c/0x1a8 warn_slowpath_fmt from drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8 drm_atomic_helper_wait_for_vblanks.part.0 from drm_atomic_helper_commit_tail_rpm+0x7c/0x8c drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x184 commit_tail from drm_atomic_helper_commit+0x168/0x190 drm_atomic_helper_commit from drm_atomic_commit+0xb4/0xe0 drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x27c drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1cc drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40 drm_client_modeset_commit from __drm_fb_helper_restore_fbdev_mode_unlocked+0x9c/0xc4 __drm_fb_helper_restore_fbdev_mode_unlocked from drm_fb_helper_set_par+0x2c/0x3c drm_fb_helper_set_par from fbcon_init+0x3d8/0x550 fbcon_init from visual_init+0xc0/0x108 visual_init from do_bind_con_driver+0x1b8/0x3a4 do_bind_con_driver from do_take_over_console+0x140/0x1ec do_take_over_console from do_fbcon_takeover+0x70/0xd0 do_fbcon_takeover from fbcon_fb_registered+0x19c/0x1ac fbcon_fb_registered from register_framebuffer+0x190/0x21c register_framebuffer from __drm_fb_helper_initial_config_and_unlock+0x350/0x574 __drm_fb_helper_initial_config_and_unlock from exynos_drm_fbdev_client_hotplug+0x6c/0xb0 exynos_drm_fbdev_client_hotplug from drm_client_register+0x58/0x94 drm_client_register from exynos_drm_bind+0x160/0x190 exynos_drm_bind from try_to_bring_up_aggregate_device+0x200/0x2d8 try_to_bring_up_aggregate_device from __component_add+0xb0/0x170 __component_add from mixer_probe+0x74/0xcc mixer_probe from platform_probe+0x5c/0xb8 platform_probe from really_probe+0xe0/0x3d8 really_probe from __driver_probe_device+0x9c/0x1e4 __driver_probe_device from driver_probe_device+0x30/0xc0 driver_probe_device from __device_attach_driver+0xa8/0x120 __device_attach_driver from bus_for_each_drv+0x80/0xcc bus_for_each_drv from __device_attach+0xac/0x1fc __device_attach from bus_probe_device+0x8c/0x90 bus_probe_device from deferred_probe_work_func+0x98/0xe0 deferred_probe_work_func from process_one_work+0x240/0x6d0 process_one_work from worker_thread+0x1a0/0x3f4 worker_thread from kthread+0x104/0x138 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf0895fb0 to 0xf0895ff8) ... irq event stamp: 82357 hardirqs last enabled at (82363): [] vprintk_emit+0x308/0x33c hardirqs last disabled at (82368): [] vprintk_emit+0x2bc/0x33c softirqs last enabled at (81614): [] __do_softirq+0x320/0x500 softirqs last disabled at (81609): [] __irq_exit_rcu+0x130/0x184 ---[ end trace 0000000000000000 ]--- exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [CRTC:70:crtc-1] commit wait timed out exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [CONNECTOR:74:HDMI-A-1] commit wait timed out exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [PLANE:56:plane-5] commit wait timed out exynos-mixer 12c10000.mixer: timeout waiting for VSYNC Cc: stable@vger.kernel.org Fixes: 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()") Signed-off-by: Marek Szyprowski Signed-off-by: Inki Dae Signed-off-by: Greg Kroah-Hartman commit a269c5701244db2722ae0fce5d1854f5d8f31224 Author: Jani Nikula Date: Thu May 30 13:01:51 2024 +0300 drm/exynos/vidi: fix memory leak in .get_modes() commit 38e3825631b1f314b21e3ade00b5a4d737eb054e upstream. The duplicated EDID is never freed. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula Signed-off-by: Inki Dae Signed-off-by: Greg Kroah-Hartman commit fd880577c6d4b1102249adf48092cd7bba2d5139 Author: Mario Limonciello Date: Thu May 9 13:45:02 2024 -0500 ACPI: x86: Force StorageD3Enable on more products commit e79a10652bbd320649da705ca1ea0c04351af403 upstream. A Rembrandt-based HP thin client is reported to have problems where the NVME disk isn't present after resume from s2idle. This is because the NVME disk wasn't put into D3 at suspend, and that happened because the StorageD3Enable _DSD was missing in the BIOS. As AMD's architecture requires that the NVME is in D3 for s2idle, adjust the criteria for force_storage_d3 to match *all* Zen SoCs when the FADT advertises low power idle support. This will ensure that any future products with this BIOS deficiency don't need to be added to the allow list of overrides. Cc: All applicable Signed-off-by: Mario Limonciello Acked-by: Hans de Goede Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 5bf196f1936bf93df31112fbdfb78c03537c07b0 Author: John David Anglin Date: Mon Jun 10 18:47:07 2024 +0000 parisc: Try to fix random segmentation faults in package builds commit 72d95924ee35c8cd16ef52f912483ee938a34d49 upstream. PA-RISC systems with PA8800 and PA8900 processors have had problems with random segmentation faults for many years. Systems with earlier processors are much more stable. Systems with PA8800 and PA8900 processors have a large L2 cache which needs per page flushing for decent performance when a large range is flushed. The combined cache in these systems is also more sensitive to non-equivalent aliases than the caches in earlier systems. The majority of random segmentation faults that I have looked at appear to be memory corruption in memory allocated using mmap and malloc. My first attempt at fixing the random faults didn't work. On reviewing the cache code, I realized that there were two issues which the existing code didn't handle correctly. Both relate to cache move-in. Another issue is that the present bit in PTEs is racy. 1) PA-RISC caches have a mind of their own and they can speculatively load data and instructions for a page as long as there is a entry in the TLB for the page which allows move-in. TLBs are local to each CPU. Thus, the TLB entry for a page must be purged before flushing the page. This is particularly important on SMP systems. In some of the flush routines, the flush routine would be called and then the TLB entry would be purged. This was because the flush routine needed the TLB entry to do the flush. 2) My initial approach to trying the fix the random faults was to try and use flush_cache_page_if_present for all flush operations. This actually made things worse and led to a couple of hardware lockups. It finally dawned on me that some lines weren't being flushed because the pte check code was racy. This resulted in random inequivalent mappings to physical pages. The __flush_cache_page tmpalias flush sets up its own TLB entry and it doesn't need the existing TLB entry. As long as we can find the pte pointer for the vm page, we can get the pfn and physical address of the page. We can also purge the TLB entry for the page before doing the flush. Further, __flush_cache_page uses a special TLB entry that inhibits cache move-in. When switching page mappings, we need to ensure that lines are removed from the cache. It is not sufficient to just flush the lines to memory as they may come back. This made it clear that we needed to implement all the required flush operations using tmpalias routines. This includes flushes for user and kernel pages. After modifying the code to use tmpalias flushes, it became clear that the random segmentation faults were not fully resolved. The frequency of faults was worse on systems with a 64 MB L2 (PA8900) and systems with more CPUs (rp4440). The warning that I added to flush_cache_page_if_present to detect pages that couldn't be flushed triggered frequently on some systems. Helge and I looked at the pages that couldn't be flushed and found that the PTE was either cleared or for a swap page. Ignoring pages that were swapped out seemed okay but pages with cleared PTEs seemed problematic. I looked at routines related to pte_clear and noticed ptep_clear_flush. The default implementation just flushes the TLB entry. However, it was obvious that on parisc we need to flush the cache page as well. If we don't flush the cache page, stale lines will be left in the cache and cause random corruption. Once a PTE is cleared, there is no way to find the physical address associated with the PTE and flush the associated page at a later time. I implemented an updated change with a parisc specific version of ptep_clear_flush. It fixed the random data corruption on Helge's rp4440 and rp3440, as well as on my c8000. At this point, I realized that I could restore the code where we only flush in flush_cache_page_if_present if the page has been accessed. However, for this, we also need to flush the cache when the accessed bit is cleared in ptep_clear_flush_young to keep things synchronized. The default implementation only flushes the TLB entry. Other changes in this version are: 1) Implement parisc specific version of ptep_get. It's identical to default but needed in arch/parisc/include/asm/pgtable.h. 2) Revise parisc implementation of ptep_test_and_clear_young to use ptep_get (READ_ONCE). 3) Drop parisc implementation of ptep_get_and_clear. We can use default. 4) Revise flush_kernel_vmap_range and invalidate_kernel_vmap_range to use full data cache flush. 5) Move flush_cache_vmap and flush_cache_vunmap to cache.c. Handle VM_IOREMAP case in flush_cache_vmap. At this time, I don't know whether it is better to always flush when the PTE present bit is set or when both the accessed and present bits are set. The later saves flushing pages that haven't been accessed, but we need to flush in ptep_clear_flush_young. It also needs a page table lookup to find the PTE pointer. The lpa instruction only needs a page table lookup when the PTE entry isn't in the TLB. We don't atomically handle setting and clearing the _PAGE_ACCESSED bit. If we miss an update, we may miss a flush and the cache may get corrupted. Whether the current code is effectively atomic depends on process control. When CONFIG_FLUSH_PAGE_ACCESSED is set to zero, the page will eventually be flushed when the PTE is cleared or in flush_cache_page_if_present. The _PAGE_ACCESSED bit is not used, so the problem is avoided. The flush method can be selected using the CONFIG_FLUSH_PAGE_ACCESSED define in cache.c. The default is 0. I didn't see a large difference in performance. Signed-off-by: John David Anglin Cc: # v6.6+ Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit a42b0060d6ff2f7e59290a26d5f162a3c6329b90 Author: Dirk Behme Date: Mon May 13 07:06:34 2024 +0200 drivers: core: synchronize really_probe() and dev_uevent() commit c0a40097f0bc81deafc15f9195d1fb54595cd6d0 upstream. Synchronize the dev->driver usage in really_probe() and dev_uevent(). These can run in different threads, what can result in the following race condition for dev->driver uninitialization: Thread #1: ========== really_probe() { ... probe_failed: ... device_unbind_cleanup(dev) { ... dev->driver = NULL; // <= Failed probe sets dev->driver to NULL ... } ... } Thread #2: ========== dev_uevent() { ... if (dev->driver) // If dev->driver is NULLed from really_probe() from here on, // after above check, the system crashes add_uevent_var(env, "DRIVER=%s", dev->driver->name); ... } really_probe() holds the lock, already. So nothing needs to be done there. dev_uevent() is called with lock held, often, too. But not always. What implies that we can't add any locking in dev_uevent() itself. So fix this race by adding the lock to the non-protected path. This is the path where above race is observed: dev_uevent+0x235/0x380 uevent_show+0x10c/0x1f0 <= Add lock here dev_attr_show+0x3a/0xa0 sysfs_kf_seq_show+0x17c/0x250 kernfs_seq_show+0x7c/0x90 seq_read_iter+0x2d7/0x940 kernfs_fop_read_iter+0xc6/0x310 vfs_read+0x5bc/0x6b0 ksys_read+0xeb/0x1b0 __x64_sys_read+0x42/0x50 x64_sys_call+0x27ad/0x2d30 do_syscall_64+0xcd/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Similar cases are reported by syzkaller in https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a But these are regarding the *initialization* of dev->driver dev->driver = drv; As this switches dev->driver to non-NULL these reports can be considered to be false-positives (which should be "fixed" by this commit, as well, though). The same issue was reported and tried to be fixed back in 2015 in https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@samsung.com/ already. Fixes: 239378f16aa1 ("Driver core: add uevent vars for devices of a class") Cc: stable Cc: syzbot+ffa8143439596313a85a@syzkaller.appspotmail.com Cc: Ashish Sangwan Cc: Namjae Jeon Signed-off-by: Dirk Behme Link: https://lore.kernel.org/r/20240513050634.3964461-1-dirk.behme@de.bosch.com Signed-off-by: Greg Kroah-Hartman commit e57c84e156e7c85f69905fcd4a09fd4168f544f9 Author: Jean-Baptiste Maneyrol Date: Mon May 27 21:00:08 2024 +0000 iio: imu: inv_icm42600: delete unneeded update watermark call commit 245f3b149e6cc3ac6ee612cdb7042263bfc9e73c upstream. Update watermark will be done inside the hwfifo_set_watermark callback just after the update_scan_mode. It is useless to do it here. Fixes: 7f85e42a6c54 ("iio: imu: inv_icm42600: add buffer support in iio devices") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol Link: https://lore.kernel.org/r/20240527210008.612932-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit aaf6b327a386c5e6aad3373263b41f55f90c0f4f Author: Jean-Baptiste Maneyrol Date: Fri May 24 12:48:51 2024 +0000 iio: invensense: fix odr switching to same value commit 95444b9eeb8c5c0330563931d70c61ca3b101548 upstream. ODR switching happens in 2 steps, update to store the new value and then apply when the ODR change flag is received in the data. When switching to the same ODR value, the ODR change flag is never happening, and frequency switching is blocked waiting for the never coming apply. Fix the issue by preventing update to happen when switching to same ODR value. Fixes: 0ecc363ccea7 ("iio: make invensense timestamp module generic") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol Link: https://lore.kernel.org/r/20240524124851.567485-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 8e472061a32c777a40ea75890015bfa2eab65665 Author: Marc Ferland Date: Wed May 1 11:05:54 2024 -0400 iio: dac: ad5592r: fix temperature channel scaling value commit 279428df888319bf68f2686934897301a250bb84 upstream. The scale value for the temperature channel is (assuming Vref=2.5 and the datasheet): 376.7897513 When calculating both val and val2 for the temperature scale we use (3767897513/25) and multiply it by Vref (here I assume 2500mV) to obtain: 2500 * (3767897513/25) ==> 376789751300 Finally we divide with remainder by 10^9 to get: val = 376 val2 = 789751300 However, we return IIO_VAL_INT_PLUS_MICRO (should have been NANO) as the scale type. So when converting the raw temperature value to the 'processed' temperature value we will get (assuming raw=810, offset=-753): processed = (raw + offset) * scale_val = (810 + -753) * 376 = 21432 processed += div((raw + offset) * scale_val2, 10^6) += div((810 + -753) * 789751300, 10^6) += 45015 ==> 66447 ==> 66.4 Celcius instead of the expected 21.5 Celsius. Fix this issue by changing IIO_VAL_INT_PLUS_MICRO to IIO_VAL_INT_PLUS_NANO. Fixes: 56ca9db862bf ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs") Signed-off-by: Marc Ferland Link: https://lore.kernel.org/r/20240501150554.1871390-1-marc.ferland@sonatest.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 24ff87bb9f0bae4e6619fd4ff20ebe11cf340eb6 Author: David Lechner Date: Fri May 3 14:45:05 2024 -0500 iio: adc: ad9467: fix scan type sign commit 8a01ef749b0a632f0e1f4ead0f08b3310d99fcb1 upstream. According to the IIO documentation, the sign in the scan type should be lower case. The ad9467 driver was incorrectly using upper case. Fix by changing to lower case. Fixes: 4606d0f4b05f ("iio: adc: ad9467: add support for AD9434 high-speed ADC") Fixes: ad6797120238 ("iio: adc: ad9467: add support AD9467 ADC") Signed-off-by: David Lechner Link: https://lore.kernel.org/r/20240503-ad9467-fix-scan-type-sign-v1-1-c7a1a066ebb9@baylibre.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit d4f3861893f94a2e85530f3279bb37b17e07278b Author: Benjamin Segall Date: Wed Jun 12 12:44:44 2024 -0700 x86/boot: Don't add the EFI stub to targets, again commit b2747f108b8034271fd5289bd8f3a7003e0775a3 upstream. This is a re-commit of da05b143a308 ("x86/boot: Don't add the EFI stub to targets") after the tagged patch incorrectly reverted it. vmlinux-objs-y is added to targets, with an assumption that they are all relative to $(obj); adding a $(objtree)/drivers/... path causes the build to incorrectly create a useless arch/x86/boot/compressed/drivers/... directory tree. Fix this just by using a different make variable for the EFI stub. Fixes: cb8bda8ad443 ("x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S") Signed-off-by: Ben Segall Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Ard Biesheuvel Cc: stable@vger.kernel.org # v6.1+ Link: https://lore.kernel.org/r/xm267ceukksz.fsf@bsegall.svl.corp.google.com Signed-off-by: Greg Kroah-Hartman commit db20d4e4872bb6f9285b0ee1ea022f889e657a3a Author: Namjae Jeon Date: Tue Jun 11 23:27:27 2024 +0900 ksmbd: fix missing use of get_write in in smb2_set_ea() commit 2bfc4214c69c62da13a9da8e3c3db5539da2ccd3 upstream. Fix an issue where get_write is not used in smb2_set_ea(). Fixes: 6fc0a265e1b9 ("ksmbd: fix potential circular locking issue in smb2_set_ea()") Cc: stable@vger.kernel.org Reported-by: Wang Zhaolong Signed-off-by: Namjae Jeon Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 13b38f9262f5e57a700569570d9109fce9875604 Author: Namjae Jeon Date: Mon Jun 10 23:06:19 2024 +0900 ksmbd: move leading slash check to smb2_get_name() commit 1cdeca6a7264021e20157de0baf7880ff0ced822 upstream. If the directory name in the root of the share starts with character like 镜(0x955c) or Ṝ(0x1e5c), it (and anything inside) cannot be accessed. The leading slash check must be checked after converting unicode to nls string. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 86c9713602f786f441630c4ee02891987f8618b9 Author: Yongzhi Liu Date: Thu May 23 20:14:33 2024 +0800 misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe() commit 086c6cbcc563c81d55257f9b27e14faf1d0963d3 upstream. When auxiliary_device_add() returns error and then calls auxiliary_device_uninit(), callback function gp_auxiliary_device_release() calls ida_free() and kfree(aux_device_wrapper) to free memory. We should't call them again in the error handling path. Fix this by skipping the redundant cleanup functions. Fixes: 393fc2f5948f ("misc: microchip: pci1xxxx: load auxiliary bus driver for the PIO function in the multi-function endpoint of pci1xxxx device.") Signed-off-by: Yongzhi Liu Link: https://lore.kernel.org/r/20240523121434.21855-3-hyperlyzcs@gmail.com Signed-off-by: Greg Kroah-Hartman commit ca6660c956242623b4cfe9be2a1abc67907c44bf Author: Aleksandr Mishin Date: Tue Jun 11 11:25:46 2024 +0300 bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send() [ Upstream commit a9b9741854a9fe9df948af49ca5514e0ed0429df ] In case of token is released due to token->state == BNXT_HWRM_DEFERRED, released token (set to NULL) is used in log messages. This issue is expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But this error code is returned by recent firmware. So some firmware may not return it. This may lead to NULL pointer dereference. Adjust this issue by adding token pointer check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 8fa4219dba8e ("bnxt_en: add dynamic debug support for HWRM messages") Suggested-by: Michael Chan Signed-off-by: Aleksandr Mishin Reviewed-by: Wojciech Drewek Reviewed-by: Michael Chan Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 185c72f6b9ebd87f609380614145c58a9dc22a36 Author: Rao Shoaib Date: Tue Jun 11 01:46:39 2024 -0700 af_unix: Read with MSG_PEEK loops if the first unread byte is OOB [ Upstream commit a6736a0addd60fccc3a3508461d72314cc609772 ] Read with MSG_PEEK flag loops if the first byte to read is an OOB byte. commit 22dd70eb2c3d ("af_unix: Don't peek OOB data without MSG_OOB.") addresses the loop issue but does not address the issue that no data beyond OOB byte can be read. >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) >>> c1.send(b'a', MSG_OOB) 1 >>> c1.send(b'b') 1 >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'b' >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) >>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1) >>> c1.send(b'a', MSG_OOB) 1 >>> c1.send(b'b') 1 >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'a' >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'a' >>> c2.recv(1, MSG_DONTWAIT) b'a' >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'b' >>> Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 183ebc167a8a19e916b885d4bb61a3491991bfa5 Author: Taehee Yoo Date: Wed Jun 12 06:04:46 2024 +0000 ionic: fix use after netif_napi_del() [ Upstream commit 79f18a41dd056115d685f3b0a419c7cd40055e13 ] When queues are started, netif_napi_add() and napi_enable() are called. If there are 4 queues and only 3 queues are used for the current configuration, only 3 queues' napi should be registered and enabled. The ionic_qcq_enable() checks whether the .poll pointer is not NULL for enabling only the using queue' napi. Unused queues' napi will not be registered by netif_napi_add(), so the .poll pointer indicates NULL. But it couldn't distinguish whether the napi was unregistered or not because netif_napi_del() doesn't reset the .poll pointer to NULL. So, ionic_qcq_enable() calls napi_enable() for the queue, which was unregistered by netif_napi_del(). Reproducer: ethtool -L rx 1 tx 1 combined 0 ethtool -L rx 0 tx 0 combined 1 ethtool -L rx 0 tx 0 combined 4 Splat looks like: kernel BUG at net/core/dev.c:6666! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16 Workqueue: events ionic_lif_deferred_work [ionic] RIP: 0010:napi_enable+0x3b/0x40 Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28 RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20 FS: 0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? napi_enable+0x3b/0x40 ? do_error_trap+0x83/0xb0 ? napi_enable+0x3b/0x40 ? napi_enable+0x3b/0x40 ? exc_invalid_op+0x4e/0x70 ? napi_enable+0x3b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? napi_enable+0x3b/0x40 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] process_one_work+0x145/0x360 worker_thread+0x2bb/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Taehee Yoo Reviewed-by: Brett Creeley Reviewed-by: Shannon Nelson Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 7caefa2771722e65496d85b62e1dc4442b7d1345 Author: Nikolay Aleksandrov Date: Sun Jun 9 13:36:54 2024 +0300 net: bridge: mst: fix suspicious rcu usage in br_mst_set_state [ Upstream commit 546ceb1dfdac866648ec959cbc71d9525bd73462 ] I converted br_mst_set_state to RCU to avoid a vlan use-after-free but forgot to change the vlan group dereference helper. Switch to vlan group RCU deref helper to fix the suspicious rcu usage warning. Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit a6cc9e9a651b9861efa068c164ee62dfba68c6ca Author: Nikolay Aleksandrov Date: Sun Jun 9 13:36:53 2024 +0300 net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state [ Upstream commit 36c92936e868601fa1f43da6758cf55805043509 ] Pass the already obtained vlan group pointer to br_mst_vlan_set_state() instead of dereferencing it again. Each caller has already correctly dereferenced it for their context. This change is required for the following suspicious RCU dereference fix. No functional changes intended. Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 96d3265fc4f1b68e60a5ae7a51b97463cc8c6702 Author: Petr Pavlu Date: Fri Jun 7 13:28:28 2024 +0200 net/ipv6: Fix the RT cache flush via sysctl using a previous delay [ Upstream commit 14a20e5b4ad998793c5f43b0330d9e1388446cf3 ] The net.ipv6.route.flush system parameter takes a value which specifies a delay used during the flush operation for aging exception routes. The written value is however not used in the currently requested flush and instead utilized only in the next one. A problem is that ipv6_sysctl_rtcache_flush() first reads the old value of net->ipv6.sysctl.flush_delay into a local delay variable and then calls proc_dointvec() which actually updates the sysctl based on the provided input. Fix the problem by switching the order of the two operations. Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.") Signed-off-by: Petr Pavlu Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 9a3eb4816ab9af25dd2357783e591ef66d5fe616 Author: Daniel Wagner Date: Wed Jun 12 16:02:40 2024 +0200 nvmet-passthru: propagate status from id override functions [ Upstream commit d76584e53f4244dbc154bec447c3852600acc914 ] The id override functions return a status which is not propagated to the caller. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Daniel Wagner Reviewed-by: Chaitanya Kulkarni Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Sasha Levin commit fe1e395563ccb051e9dbd8fa99859f5caaad2e71 Author: Chengming Zhou Date: Sat Jun 8 22:31:15 2024 +0800 block: fix request.queuelist usage in flush [ Upstream commit d0321c812d89c5910d8da8e4b10c891c6b96ff70 ] Friedrich Weber reported a kernel crash problem and bisected to commit 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine"). The root cause is that we use "list_move_tail(&rq->queuelist, pending)" in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch(). We don't initialize its queuelist just for this first request, although the queuelist of all later popped requests will be initialized. Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so rq->queuelist doesn't need to be initialized. It should be ok since rq can't be on any list when PREFLUSH or POSTFLUSH, has no move actually. Please note the commit 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine") also has another requirement that no drivers would touch rq->queuelist after blk_mq_end_request() since we will reuse it to add rq to the post-flush pending list in POSTFLUSH. If this is not true, we will have to revert that commit IMHO. This updated version adds "list_del_init(&rq->queuelist)" in flush rq callback since the dm layer may submit request of a weird invalid format (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH), which causes double list_add if without this "list_del_init(&rq->queuelist)". The weird invalid format problem should be fixed in dm layer. Reported-by: Friedrich Weber Closes: https://lore.kernel.org/lkml/14b89dfb-505c-49f7-aebb-01c54451db40@proxmox.com/ Closes: https://lore.kernel.org/lkml/c9d03ff7-27c5-4ebd-b3f6-5a90d96f35ba@proxmox.com/ Fixes: 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine") Cc: Christoph Hellwig Cc: ming.lei@redhat.com Cc: bvanassche@acm.org Tested-by: Friedrich Weber Signed-off-by: Chengming Zhou Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20240608143115.972486-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit 6b7155458ed20b42ee079c7a5f96589bdc1a75d4 Author: Su Hui Date: Tue Jun 11 15:37:00 2024 +0800 block: sed-opal: avoid possible wrong address reference in read_sed_opal_key() [ Upstream commit 9b1ebce6a1fded90d4a1c6c57dc6262dac4c4c14 ] Clang static checker (scan-build) warning: block/sed-opal.c:line 317, column 3 Value stored to 'ret' is never read. Fix this problem by returning the error code when keyring_search() failed. Otherwise, 'key' will have a wrong value when 'kerf' stores the error code. Fixes: 3bfeb6125664 ("block: sed-opal: keyring support for SED keys") Signed-off-by: Su Hui Link: https://lore.kernel.org/r/20240611073659.429582-1-suhui@nfschina.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit 35119b1139e74edbc247d85fdc0ebd4635d17f77 Author: Xiaolei Wang Date: Sat Jun 8 22:35:24 2024 +0800 net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters [ Upstream commit be27b896529787e23a35ae4befb6337ce73fcca0 ] The current cbs parameter depends on speed after uplinking, which is not needed and will report a configuration error if the port is not initially connected. The UAPI exposed by tc-cbs requires userspace to recalculate the send slope anyway, because the formula depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective. Therefore, we use offload->sendslope and offload->idleslope to derive the original port_transmit_rate from the CBS formula. Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC") Signed-off-by: Xiaolei Wang Reviewed-by: Wojciech Drewek Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20240608143524.2065736-1-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit d01f39f73bed4cb66ce28374b20d3e447237aeed Author: Joshua Washington Date: Mon Jun 10 15:57:18 2024 -0700 gve: ignore nonrelevant GSO type bits when processing TSO headers [ Upstream commit 1b9f756344416e02b41439bf2324b26aa25e141c ] TSO currently fails when the skb's gso_type field has more than one bit set. TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes virtualization, such as QEMU, a real use-case. The gso_type and gso_size fields as passed from userspace in virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type |= SKB_GSO_DODGY to force the packet to enter the software GSO stack for verification. This issue might similarly come up when the CWR bit is set in the TCP header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit to be set. Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Joshua Washington Reviewed-by: Praveen Kaligineedi Reviewed-by: Harshitha Ramamurthy Reviewed-by: Willem de Bruijn Suggested-by: Eric Dumazet Acked-by: Andrei Vagin v2 - Remove unnecessary comments, remove line break between fixes tag and signoffs. v3 - Add back unrelated empty line removal. Link: https://lore.kernel.org/r/20240610225729.2985343-1-joshwash@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit b0c95cefd9b651c8915a906bd3a8cd2e8fa1e015 Author: Kory Maincent Date: Mon Jun 10 10:34:26 2024 +0200 net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP [ Upstream commit 144ba8580bcb82b2686c3d1a043299d844b9a682 ] ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by checkpatch script. Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Reviewed-by: Andrew Lunn Acked-by: Oleksij Rempel Signed-off-by: Kory Maincent Link: https://lore.kernel.org/r/20240610083426.740660-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 2ad10c2aadb762b3625f57380d2d084f69e815ab Author: Ziqi Chen Date: Fri Jun 7 18:06:23 2024 +0800 scsi: ufs: core: Quiesce request queues before checking pending cmds [ Upstream commit 77691af484e28af7a692e511b9ed5ca63012ec6e ] In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked, ufshcd_pending_cmds() is called to check whether there are pending transactions or not. And only if there are no pending transactions can we proceed to kickstart the clock scaling sequence. ufshcd_pending_cmds() traverses over all SCSI devices and calls sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down to three steps: 1. Calculate the nr outstanding bits set in the 'word' bitmap. 2. Calculate the nr outstanding bits set in the 'cleared' bitmap. 3. Subtract the result from step 1 by the result from step 2. This can lead to a race condition as outlined below: Assume there is one pending transaction in the request queue of one SCSI device, say sda, and the budget token of this request is 0, the 'word' is 0x1 and the 'cleared' is 0x0. 1. When step 1 executes, it gets the result as 1. 2. Before step 2 executes, block layer tries to dispatch a new request to sda. Since the SCSI layer is blocked, the request cannot pass through SCSI but the block layer would do budget_get() and budget_put() to sda's budget map regardless, so the 'word' has become 0x3 and 'cleared' has become 0x2 (assume the new request got budget token 1). 3. When step 2 executes, it gets the result as 1. 4. When step 3 executes, it gets the result as 0, meaning there is no pending transactions, which is wrong. Thread A Thread B ufshcd_pending_cmds() __blk_mq_sched_dispatch_requests() | | sbitmap_weight(word) | | scsi_mq_get_budget() | | | scsi_mq_put_budget() | | sbitmap_weight(cleared) ... When this race condition happens, the clock scaling sequence is started with transactions still in flight, leading to subsequent hibernate enter failure, broken link, task abort and back to back error recovery. Fix this race condition by quiescing the request queues before calling ufshcd_pending_cmds() so that block layer won't touch the budget map when ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer blocking/unblocking to reduce redundancies and latencies. Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") Co-developed-by: Can Guo Signed-off-by: Can Guo Signed-off-by: Ziqi Chen Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit 567cfc59e4682185edd8cee9bb2dfc0576a63348 Author: Kees Cook Date: Mon Jun 10 14:02:27 2024 -0700 x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking [ Upstream commit 8c860ed825cb85f6672cd7b10a8f33e3498a7c81 ] When reworking the range checking for get_user(), the get_user_8() case on 32-bit wasn't zeroing the high register. (The jump to bad_get_user_8 was accidentally dropped.) Restore the correct error handling destination (and rename the jump to using the expected ".L" prefix). While here, switch to using a named argument ("size") for the call template ("%c4" to "%c[size]") as already used in the other call templates in this file. Found after moving the usercopy selftests to KUnit: # usercopy_test_invalid: EXPECTATION FAILED at lib/usercopy_kunit.c:278 Expected val_u64 == 0, but val_u64 == -60129542144 (0xfffffff200000000) Closes: https://lore.kernel.org/all/CABVgOSn=tb=Lj9SxHuT4_9MTjjKVxsq-ikdXC4kGHO4CfKVmGQ@mail.gmail.com Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Reported-by: David Gow Signed-off-by: Kees Cook Signed-off-by: Dave Hansen Reviewed-by: Kirill A. Shutemov Reviewed-by: Qiuxu Zhuo Tested-by: David Gow Link: https://lore.kernel.org/all/20240610210213.work.143-kees%40kernel.org Signed-off-by: Sasha Levin commit 5396ce9a5e68299a3794a79cbd1cd0286cf1f22c Author: Uros Bizjak Date: Tue Mar 19 11:40:13 2024 +0100 x86/asm: Use %c/%n instead of %P operand modifier in asm templates [ Upstream commit 41cd2e1ee96e56401a18dbce6f42f0bdaebcbf3b ] The "P" asm operand modifier is a x86 target-specific modifier. When used with a constant, the "P" modifier emits "cst" instead of "$cst". This property is currently used to emit the bare constant without all syntax-specific prefixes. The generic "c" resp. "n" operand modifier should be used instead. No functional changes intended. Signed-off-by: Uros Bizjak Signed-off-by: Ingo Molnar Cc: Linus Torvalds Cc: Josh Poimboeuf Cc: Ard Biesheuvel Cc: "H. Peter Anvin" Link: https://lore.kernel.org/r/20240319104418.284519-3-ubizjak@gmail.com Stable-dep-of: 8c860ed825cb ("x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking") Signed-off-by: Sasha Levin commit 2ba35b37f780c6410bb4bba9c3072596d8576702 Author: Jozsef Kadlecsik Date: Tue Jun 4 15:58:03 2024 +0200 netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type [ Upstream commit 4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10 ] Lion Ackermann reported that there is a race condition between namespace cleanup in ipset and the garbage collection of the list:set type. The namespace cleanup can destroy the list:set type of sets while the gc of the set type is waiting to run in rcu cleanup. The latter uses data from the destroyed set which thus leads use after free. The patch contains the following parts: - When destroying all sets, first remove the garbage collectors, then wait if needed and then destroy the sets. - Fix the badly ordered "wait then remove gc" for the destroy a single set case. - Fix the missing rcu locking in the list:set type in the userspace test case. - Use proper RCU list handlings in the list:set type. The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list flush to cancel_gc). Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression in swap operation) Reported-by: Lion Ackermann Tested-by: Lion Ackermann Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin commit b30669fdea0ca03aa22995e6c99f7e7d9dee89ff Author: Davide Ornaghi Date: Wed Jun 5 13:03:45 2024 +0200 netfilter: nft_inner: validate mandatory meta and payload [ Upstream commit c4ab9da85b9df3692f861512fe6c9812f38b7471 ] Check for mandatory netlink attributes in payload and meta expression when used embedded from the inner expression, otherwise NULL pointer dereference is possible from userspace. Fixes: a150d122b6bd ("netfilter: nft_meta: add inner match support") Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Davide Ornaghi Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin commit 7ccca396e989623facf6f3aba698ca89874592c0 Author: Pauli Virtanen Date: Sun Jun 9 18:06:20 2024 +0300 Bluetooth: fix connection setup in l2cap_connect [ Upstream commit c695439d198d30e10553a3b98360c5efe77b6903 ] The amp_id argument of l2cap_connect() was removed in commit 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support") It was always called with amp_id == 0, i.e. AMP_ID_BREDR == 0x00 (ie. non-AMP controller). In the above commit, the code path for amp_id != 0 was preserved, although it should have used the amp_id == 0 one. Restore the previous behavior of the non-AMP code path, to fix problems with L2CAP connections. Fixes: 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support") Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Sasha Levin commit 599a28fa9ecd98f7c2937e4215a7788403d1cfd6 Author: Luiz Augusto von Dentz Date: Mon May 20 16:03:07 2024 -0400 Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ [ Upstream commit 806a5198c05987b748b50f3d0c0cfb3d417381a4 ] This removes the bogus check for max > hcon->le_conn_max_interval since the later is just the initial maximum conn interval not the maximum the stack could support which is really 3200=4000ms. In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values of the following fields in IXIT that would cause hci_check_conn_params to fail: TSPX_conn_update_int_min TSPX_conn_update_int_max TSPX_conn_update_peripheral_latency TSPX_conn_update_supervision_timeout Link: https://github.com/bluez/bluez/issues/847 Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval") Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Sasha Levin commit 110764a0713e921907216b8f131f0386a7d4a7f7 Author: Gal Pressman Date: Thu Jun 6 23:32:49 2024 +0300 net/mlx5e: Fix features validation check for tunneled UDP (non-VXLAN) packets [ Upstream commit 791b4089e326271424b78f2fae778b20e53d071b ] Move the vxlan_features_check() call to after we verified the packet is a tunneled VXLAN packet. Without this, tunneled UDP non-VXLAN packets (for ex. GENENVE) might wrongly not get offloaded. In some cases, it worked by chance as GENEVE header is the same size as VXLAN, but it is obviously incorrect. Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Gal Pressman Reviewed-by: Dragos Tatulea Signed-off-by: Tariq Toukan Reviewed-by: Wojciech Drewek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit bb5c1b0fbd983c3d7af2f4145adc62821323f6ee Author: Gal Pressman Date: Thu Jun 6 23:32:48 2024 +0300 geneve: Fix incorrect inner network header offset when innerprotoinherit is set [ Upstream commit c6ae073f5903f6c6439d0ac855836a4da5c0a701 ] When innerprotoinherit is set, the tunneled packets do not have an inner Ethernet header. Change 'maclen' to not always assume the header length is ETH_HLEN, as there might not be a MAC header. This resolves issues with drivers (e.g. mlx5, in mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset to be correct, and use it for TX offloads. Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb") Signed-off-by: Gal Pressman Signed-off-by: Tariq Toukan Reviewed-by: Wojciech Drewek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit c72660999c17fa2670ba132c529e8b35a34025e5 Author: Andy Shevchenko Date: Thu Jun 6 19:13:03 2024 +0300 net dsa: qca8k: fix usages of device_get_named_child_node() [ Upstream commit d029edefed39647c797c2710aedd9d31f84c069e ] The documentation for device_get_named_child_node() mentions this important point: " The caller is responsible for calling fwnode_handle_put() on the returned fwnode pointer. " Add fwnode_handle_put() to avoid leaked references. Fixes: 1e264f9d2918 ("net: dsa: qca8k: add LEDs basic support") Reviewed-by: Simon Horman Signed-off-by: Andy Shevchenko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit d61808ac9947e5b6ebd2123d140d01ba9dfd05ab Author: Eric Dumazet Date: Thu Jun 6 15:46:51 2024 +0000 tcp: fix race in tcp_v6_syn_recv_sock() [ Upstream commit d37fe4255abe8e7b419b90c5847e8ec2b8debb08 ] tcp_v6_syn_recv_sock() calls ip6_dst_store() before inet_sk(newsk)->pinet6 has been set up. This means ip6_dst_store() writes over the parent (listener) np->dst_cookie. This is racy because multiple threads could share the same parent and their final np->dst_cookie could be wrong. Move ip6_dst_store() call after inet_sk(newsk)->pinet6 has been changed and after the copy of parent ipv6_pinfo. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit b5c8ffc8cfde6ac6b05188e008518497512814b1 Author: Adam Miotk Date: Mon Jun 10 11:27:39 2024 +0100 drm/bridge/panel: Fix runtime warning on panel bridge release [ Upstream commit ce62600c4dbee8d43b02277669dd91785a9b81d9 ] Device managed panel bridge wrappers are created by calling to drm_panel_bridge_add_typed() and registering a release handler for clean-up when the device gets unbound. Since the memory for this bridge is also managed and linked to the panel device, the release function should not try to free that memory. Moreover, the call to devm_kfree() inside drm_panel_bridge_remove() will fail in this case and emit a warning because the panel bridge resource is no longer on the device resources list (it has been removed from there before the call to release handlers). Fixes: 67022227ffb1 ("drm/bridge: Add a devm_ allocator for panel bridge.") Signed-off-by: Adam Miotk Signed-off-by: Maxime Ripard Link: https://patchwork.freedesktop.org/patch/msgid/20240610102739.139852-1-adam.miotk@arm.com Signed-off-by: Sasha Levin commit 9460961d82134ceda7377b77a3e3e3531b625dfe Author: Amjad Ouled-Ameur Date: Mon Jun 10 11:20:56 2024 +0100 drm/komeda: check for error-valued pointer [ Upstream commit b880018edd3a577e50366338194dee9b899947e0 ] komeda_pipeline_get_state() may return an error-valued pointer, thus check the pointer for negative or null value before dereferencing. Fixes: 502932a03fce ("drm/komeda: Add the initial scaler support for CORE") Signed-off-by: Amjad Ouled-Ameur Signed-off-by: Maxime Ripard Link: https://patchwork.freedesktop.org/patch/msgid/20240610102056.40406-1-amjad.ouled-ameur@arm.com Signed-off-by: Sasha Levin commit f100031fd6a570e14312c283202db955cb5f56d1 Author: Sagar Cheluvegowda Date: Wed Jun 5 11:57:18 2024 -0700 net: stmmac: dwmac-qcom-ethqos: Configure host DMA width [ Upstream commit 0579f27249047006a818e463ee66a6c314d04cea ] Commit 070246e4674b ("net: stmmac: Fix for mismatched host/device DMA address width") added support in the stmmac driver for platform drivers to indicate the host DMA width, but left it up to authors of the specific platforms to indicate if their width differed from the addr64 register read from the MAC itself. Qualcomm's EMAC4 integration supports only up to 36 bit width (as opposed to the addr64 register indicating 40 bit width). Let's indicate that in the platform driver to avoid a scenario where the driver will allocate descriptors of size that is supported by the CPU which in our case is 36 bit, but as the addr64 register is still capable of 40 bits the device will use two descriptors as one address. Fixes: 8c4d92e82d50 ("net: stmmac: dwmac-qcom-ethqos: add support for emac4 on sa8775p platforms") Signed-off-by: Sagar Cheluvegowda Reviewed-by: Simon Horman Reviewed-by: Andrew Halaney Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit fd2b613bc4c508e55c1221c6595bb889812a4fea Author: Aleksandr Mishin Date: Wed Jun 5 13:11:35 2024 +0300 liquidio: Adjust a NULL pointer handling path in lio_vf_rep_copy_packet [ Upstream commit c44711b78608c98a3e6b49ce91678cd0917d5349 ] In lio_vf_rep_copy_packet() pg_info->page is compared to a NULL value, but then it is unconditionally passed to skb_add_rx_frag() which looks strange and could lead to null pointer dereference. lio_vf_rep_copy_packet() call trace looks like: octeon_droq_process_packets octeon_droq_fast_process_packets octeon_droq_dispatch_pkt octeon_create_recv_info ...search in the dispatch_list... ->disp_fn(rdisp->rinfo, ...) lio_vf_rep_pkt_recv(struct octeon_recv_info *recv_info, ...) In this path there is no code which sets pg_info->page to NULL. So this check looks unneeded and doesn't solve potential problem. But I guess the author had reason to add a check and I have no such card and can't do real test. In addition, the code in the function liquidio_push_packet() in liquidio/lio_core.c does exactly the same. Based on this, I consider the most acceptable compromise solution to adjust this issue by moving skb_add_rx_frag() into conditional scope. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1f233f327913 ("liquidio: switchdev support for LiquidIO NIC") Signed-off-by: Aleksandr Mishin Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 47016dcb50e9ff5d170b04c8c22cb63900372bca Author: Jie Wang Date: Wed Jun 5 15:20:58 2024 +0800 net: hns3: add cond_resched() to hns3 ring buffer init process [ Upstream commit 968fde83841a8c23558dfbd0a0c69d636db52b55 ] Currently hns3 ring buffer init process would hold cpu too long with big Tx/Rx ring depth. This could cause soft lockup. So this patch adds cond_resched() to the process. Then cpu can break to run other tasks instead of busy looping. Fixes: a723fb8efe29 ("net: hns3: refine for set ring parameters") Signed-off-by: Jie Wang Signed-off-by: Jijie Shao Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 689de7c3bfc7d47e0eacc641c4ce4a0f579aeefa Author: Yonglong Liu Date: Wed Jun 5 15:20:57 2024 +0800 net: hns3: fix kernel crash problem in concurrent scenario [ Upstream commit 12cda920212a49fa22d9e8b9492ac4ea013310a4 ] When link status change, the nic driver need to notify the roce driver to handle this event, but at this time, the roce driver may uninit, then cause kernel crash. To fix the problem, when link status change, need to check whether the roce registered, and when uninit, need to wait link update finish. Fixes: 45e92b7e4e27 ("net: hns3: add calling roce callback function when link status change") Signed-off-by: Yonglong Liu Signed-off-by: Jijie Shao Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit ef01c26d6f7a4302e7b515629239fbd0e5353ad5 Author: Csókás, Bence Date: Wed Jun 5 10:42:51 2024 +0200 net: sfp: Always call `sfp_sm_mod_remove()` on remove [ Upstream commit e96b2933152fd87b6a41765b2f58b158fde855b6 ] If the module is in SFP_MOD_ERROR, `sfp_sm_mod_remove()` will not be run. As a consequence, `sfp_hwmon_remove()` is not getting run either, leaving a stale `hwmon` device behind. `sfp_sm_mod_remove()` itself checks `sfp->sm_mod_state` anyways, so this check was not really needed in the first place. Fixes: d2e816c0293f ("net: sfp: handle module remove outside state machine") Signed-off-by: "Csókás, Bence" Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20240605084251.63502-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 6797259d9b92f682274e3c2777feca46b1e15d2a Author: Masahiro Yamada Date: Fri Jun 7 03:36:12 2024 +0900 modpost: do not warn about missing MODULE_DESCRIPTION() for vmlinux.o [ Upstream commit 9185afeac2a3dcce8300a5684291a43c2838cfd6 ] Building with W=1 incorrectly emits the following warning: WARNING: modpost: missing MODULE_DESCRIPTION() in vmlinux.o This check should apply only to modules. Fixes: 1fffe7a34c89 ("script: modpost: emit a warning when the description is missing") Signed-off-by: Masahiro Yamada Reviewed-by: Vincenzo Palazzo Signed-off-by: Sasha Levin commit 6fdc1152afaef6845bd38ec50f512a02d187f5f0 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:32 2024 -0700 af_unix: Annotate data-race of sk->sk_state in unix_accept(). [ Upstream commit 1b536948e805aab61a48c5aa5db10c9afee880bd ] Once sk->sk_state is changed to TCP_LISTEN, it never changes. unix_accept() takes the advantage and reads sk->sk_state without holding unix_state_lock(). Let's use READ_ONCE() there. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit b82c97a79b8eadcaeab630a0206ea58654854232 Author: Ian Forbes Date: Thu Mar 28 14:07:16 2024 -0500 drm/vmwgfx: Don't memcmp equivalent pointers [ Upstream commit 5703fc058efdafcdd6b70776ee562478f0753acb ] These pointers are frequently the same and memcmp does not compare the pointers before comparing their contents so this was wasting cycles comparing 16 KiB of memory which will always be equal. Fixes: bb6780aa5a1d ("drm/vmwgfx: Diff cursors when using cmds") Signed-off-by: Ian Forbes Signed-off-by: Zack Rusin Link: https://patchwork.freedesktop.org/patch/msgid/20240328190716.27367-1-ian.forbes@broadcom.com Signed-off-by: Sasha Levin commit ce48b688a8d2ecea6c2d2c225f908e76c36a04a7 Author: Ian Forbes Date: Tue May 21 13:47:19 2024 -0500 drm/vmwgfx: Remove STDU logic from generic mode_valid function [ Upstream commit dde1de06bd7248fd83c4ce5cf0dbe9e4e95bbb91 ] STDU has its own mode_valid function now so this logic can be removed from the generic version. Fixes: 935f795045a6 ("drm/vmwgfx: Refactor drm connector probing for display modes") Signed-off-by: Ian Forbes Signed-off-by: Zack Rusin Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-4-ian.forbes@broadcom.com Signed-off-by: Sasha Levin commit 15a875ecfc2f33a996199b7254331c332621fa3a Author: Ian Forbes Date: Tue May 21 13:47:18 2024 -0500 drm/vmwgfx: 3D disabled should not effect STDU memory limits [ Upstream commit fb5e19d2dd03eb995ccd468d599b2337f7f66555 ] This limit became a hard cap starting with the change referenced below. Surface creation on the device will fail if the requested size is larger than this limit so altering the value arbitrarily will expose modes that are too large for the device's hard limits. Fixes: 7ebb47c9f9ab ("drm/vmwgfx: Read new register for GB memory when available") Signed-off-by: Ian Forbes Signed-off-by: Zack Rusin Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbes@broadcom.com Signed-off-by: Sasha Levin commit 3ca8e582e03ee407771065f7c9bbbd404f8e5316 Author: Ian Forbes Date: Tue May 21 13:47:17 2024 -0500 drm/vmwgfx: Filter modes which exceed graphics memory [ Upstream commit 426826933109093503e7ef15d49348fc5ab505fe ] SVGA requires individual surfaces to fit within graphics memory (max_mob_pages) which means that modes with a final buffer size that would exceed graphics memory must be pruned otherwise creation will fail. Additionally llvmpipe requires its buffer height and width to be a multiple of its tile size which is 64. As a result we have to anticipate that llvmpipe will round up the mode size passed to it by the compositor when it creates buffers and filter modes where this rounding exceeds graphics memory. This fixes an issue where VMs with low graphics memory (< 64MiB) configured with high resolution mode boot to a black screen because surface creation fails. Fixes: d947d1b71deb ("drm/vmwgfx: Add and connect connector helper function") Signed-off-by: Ian Forbes Signed-off-by: Zack Rusin Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-2-ian.forbes@broadcom.com Signed-off-by: Sasha Levin commit b0b05171de1fe3bd7bc28d8480599e6f9fba8f1a Author: Martin Krastev Date: Fri Jan 26 15:08:00 2024 -0500 drm/vmwgfx: Refactor drm connector probing for display modes [ Upstream commit 935f795045a6f9b13d28d46ebdad04bfea8750dd ] Implement drm_connector_helper_funcs.mode_valid and .get_modes, replacing custom drm_connector_funcs.fill_modes code with drm_helper_probe_single_connector_modes; for STDU, LDU & SOU display units. Signed-off-by: Martin Krastev Reviewed-by: Zack Rusin Signed-off-by: Zack Rusin Link: https://patchwork.freedesktop.org/patch/msgid/20240126200804.732454-2-zack.rusin@broadcom.com Stable-dep-of: 426826933109 ("drm/vmwgfx: Filter modes which exceed graphics memory") Signed-off-by: Sasha Levin commit f677ca8cfefee2a729ca315f660cd4868abdf8de Author: José Expósito Date: Fri May 24 15:05:39 2024 +0200 HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode() [ Upstream commit ce3af2ee95170b7d9e15fff6e500d67deab1e7b3 ] Fix a memory leak on logi_dj_recv_send_report() error path. Fixes: 6f20d3261265 ("HID: logitech-dj: Fix error handling in logi_dj_recv_switch_to_dj_mode()") Signed-off-by: José Expósito Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin commit 1bbadf953fad5b879e3780b56f37e31376117a54 Author: Su Hui Date: Tue Jun 4 20:12:43 2024 +0800 io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue() [ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ] Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051, column 3 The expression is an uninitialized value. The computed value will also be garbage. 'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is not fully initialized. Change the order of assignment for 'match' to fix this problem. Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Su Hui Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit ab702c3483db9046bab9f40306f1a28b22dbbdc0 Author: Breno Leitao Date: Tue May 7 10:00:01 2024 -0700 io_uring/io-wq: Use set_bit() and test_bit() at worker->flags [ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ] Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to address potential data races. The structure io_worker->flags may be accessed through various data paths, leading to concurrency issues. When KCSAN is enabled, it reveals data races occurring in io_worker_handle_work and io_wq_activate_free_worker functions. BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28: io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569) io_wq_worker (io_uring/io-wq.c:?) read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5: io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285) io_wq_enqueue (io_uring/io-wq.c:947) io_queue_iowq (io_uring/io_uring.c:524) io_req_task_submit (io_uring/io_uring.c:1511) io_handle_tw_list (io_uring/io_uring.c:1198) Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm"). These races involve writes and reads to the same memory location by different tasks running on different CPUs. To mitigate this, refactor the code to use atomic operations such as set_bit(), test_bit(), and clear_bit() instead of basic "and" and "or" operations. This ensures thread-safe manipulation of worker flags. Also, move `create_index` to avoid holes in the structure. Signed-off-by: Breno Leitao Link: https://lore.kernel.org/r/20240507170002.2269003-1-leitao@debian.org Signed-off-by: Jens Axboe Stable-dep-of: 91215f70ea85 ("io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()") Signed-off-by: Sasha Levin commit 7388ae6f26c0ba95f70cc96bf9c5d5cb06c908b6 Author: Lu Baolu Date: Tue May 28 12:25:28 2024 +0800 iommu: Return right value in iommu_sva_bind_device() [ Upstream commit 89e8a2366e3bce584b6c01549d5019c5cda1205e ] iommu_sva_bind_device() should return either a sva bond handle or an ERR_PTR value in error cases. Existing drivers (idxd and uacce) only check the return value with IS_ERR(). This could potentially lead to a kernel NULL pointer dereference issue if the function returns NULL instead of an error pointer. In reality, this doesn't cause any problems because iommu_sva_bind_device() only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA. In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will return an error, and the device drivers won't call iommu_sva_bind_device() at all. Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices") Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker Reviewed-by: Kevin Tian Reviewed-by: Vasant Hegde Link: https://lore.kernel.org/r/20240528042528.71396-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit c344d7030717b26943eed347de04a93d949e82b1 Author: Kun(llfl) Date: Thu May 9 08:42:20 2024 +0800 iommu/amd: Fix sysfs leak in iommu init [ Upstream commit a295ec52c8624883885396fde7b4df1a179627c3 ] During the iommu initialization, iommu_init_pci() adds sysfs nodes. However, these nodes aren't remove in free_iommu_resources() subsequently. Fixes: 39ab9555c241 ("iommu: Add sysfs bindings for struct iommu_device") Signed-off-by: Kun(llfl) Reviewed-by: Suravee Suthikulpanit Link: https://lore.kernel.org/r/c8e0d11c6ab1ee48299c288009cf9c5dae07b42d.1715215003.git.llfl@linux.alibaba.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit 30f76bc468b9b2cbbd5d3eb482661e3e4798893f Author: Nikita Zhandarovich Date: Fri May 17 07:19:14 2024 -0700 HID: core: remove unnecessary WARN_ON() in implement() [ Upstream commit 4aa2dcfbad538adf7becd0034a3754e1bd01b2b5 ] Syzkaller hit a warning [1] in a call to implement() when trying to write a value into a field of smaller size in an output report. Since implement() already has a warn message printed out with the help of hid_warn() and value in question gets trimmed with: ... value &= m; ... WARN_ON may be considered superfluous. Remove it to suppress future syzkaller triggers. [1] WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 implement drivers/hid/hid-core.c:1451 [inline] WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863 Modules linked in: CPU: 0 PID: 5084 Comm: syz-executor424 Not tainted 6.9.0-rc7-syzkaller-00183-gcf87f46fd34d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:implement drivers/hid/hid-core.c:1451 [inline] RIP: 0010:hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863 ... Call Trace: __usbhid_submit_report drivers/hid/usbhid/hid-core.c:591 [inline] usbhid_submit_report+0x43d/0x9e0 drivers/hid/usbhid/hid-core.c:636 hiddev_ioctl+0x138b/0x1f00 drivers/hid/usbhid/hiddev.c:726 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... Fixes: 95d1c8951e5b ("HID: simplify implement() a bit") Reported-by: Suggested-by: Alan Stern Signed-off-by: Nikita Zhandarovich Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin commit 17a6806f606feae3230879ba73f875e03728acba Author: Matthias Schiffer Date: Thu May 30 12:20:02 2024 +0200 gpio: tqmx86: fix broken IRQ_TYPE_EDGE_BOTH interrupt type [ Upstream commit 90dd7de4ef7ba584823dfbeba834c2919a4bb55b ] The TQMx86 GPIO controller only supports falling and rising edge triggers, but not both. Fix this by implementing a software both-edge mode that toggles the edge type after every interrupt. Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Co-developed-by: Gregor Herburger Signed-off-by: Gregor Herburger Signed-off-by: Matthias Schiffer Link: https://lore.kernel.org/r/515324f0491c4d44f4ef49f170354aca002d81ef.1717063994.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin commit 12043e85bd71a5c4ad0e921dd5b76a27934c1e0b Author: Matthias Schiffer Date: Thu May 30 12:20:01 2024 +0200 gpio: tqmx86: store IRQ trigger type and unmask status separately [ Upstream commit 08af509efdf8dad08e972b48de0e2c2a7919ea8b ] irq_set_type() should not implicitly unmask the IRQ. All accesses to the interrupt configuration register are moved to a new helper tqmx86_gpio_irq_config(). We also introduce the new rule that accessing irq_type must happen while locked, which will become significant for fixing EDGE_BOTH handling. Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Matthias Schiffer Link: https://lore.kernel.org/r/6aa4f207f77cb58ef64ffb947e91949b0f753ccd.1717063994.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin commit 01aa7b7a3dd70754162f56441377f6cfc3bff155 Author: Matthias Schiffer Date: Thu May 30 12:20:00 2024 +0200 gpio: tqmx86: introduce shadow register for GPIO output value [ Upstream commit 9d6a811b522ba558bcb4ec01d12e72a0af8e9f6e ] The TQMx86 GPIO controller uses the same register address for input and output data. Reading the register will always return current inputs rather than the previously set outputs (regardless of the current direction setting). Therefore, using a RMW pattern does not make sense when setting output values. Instead, the previously set output register value needs to be stored as a shadow register. As there is no reliable way to get the current output values from the hardware, also initialize all channels to 0, to ensure that stored and actual output values match. This should usually not have any effect in practise, as the TQMx86 UEFI sets all outputs to 0 during boot. Also prepare for extension of the driver to more than 8 GPIOs by using DECLARE_BITMAP. Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Matthias Schiffer Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/d0555933becd45fa92a85675d26e4d59343ddc01.1717063994.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin commit d690790108132cf1b58bd3f5d476b52c286285bb Author: Gregor Herburger Date: Thu May 30 12:19:59 2024 +0200 gpio: tqmx86: fix typo in Kconfig label [ Upstream commit 8c219e52ca4d9a67cd6a7074e91bf29b55edc075 ] Fix description for GPIO_TQMX86 from QTMX86 to TQMx86. Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Gregor Herburger Signed-off-by: Matthias Schiffer Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/e0e38c9944ad6d281d9a662a45d289b88edc808e.1717063994.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin commit b07a62f8c84311760e1e19e19c9114abb960051e Author: Armin Wolf Date: Tue May 28 22:49:02 2024 +0200 platform/x86: dell-smbios: Fix wrong token data in sysfs [ Upstream commit 1981b296f858010eae409548fd297659b2cc570e ] When reading token data from sysfs on my Inspiron 3505, the token locations and values are wrong. This happens because match_attribute() blindly assumes that all entries in da_tokens have an associated entry in token_attrs. This however is not true as soon as da_tokens[] contains zeroed token entries. Those entries are being skipped when initialising token_attrs, breaking the core assumption of match_attribute(). Fix this by defining an extra struct for each pair of token attributes and use container_of() to retrieve token information. Tested on a Dell Inspiron 3050. Fixes: 33b9ca1e53b4 ("platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens") Signed-off-by: Armin Wolf Reviewed-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20240528204903.445546-1-W_Armin@gmx.de Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede Signed-off-by: Sasha Levin commit 2c82e21bbc0502299c3d59460fecaf94082757c5 Author: Chen Ni Date: Tue May 28 11:08:32 2024 +0800 drm/panel: sitronix-st7789v: Add check for of_drm_get_panel_orientation [ Upstream commit 629f2b4e05225e53125aaf7ff0b87d5d53897128 ] Add check for the return value of of_drm_get_panel_orientation() and return the error if it fails in order to catch the error. Fixes: b27c0f6d208d ("drm/panel: sitronix-st7789v: add panel orientation support") Signed-off-by: Chen Ni Reviewed-by: Michael Riesch Acked-by: Jessica Zhang Link: https://lore.kernel.org/r/20240528030832.2529471-1-nichen@iscas.ac.cn Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20240528030832.2529471-1-nichen@iscas.ac.cn Signed-off-by: Sasha Levin commit ca060e25579457d0fedf92f7ac8cac8c28e307ac Author: Weiwen Hu Date: Thu May 30 14:16:46 2024 +0800 nvme: fix nvme_pr_* status code parsing [ Upstream commit b1a1fdd7096dd2d67911b07f8118ff113d815db4 ] Fix the parsing if extra status bits (e.g. MORE) is present. Fixes: 7fb42780d06c ("nvme: Convert NVMe errors to PR errors") Signed-off-by: Weiwen Hu Signed-off-by: Keith Busch Signed-off-by: Sasha Levin commit beb2dde5e1b96000b37e742dd6bd540ba39f695e Author: Masami Hiramatsu (Google) Date: Fri May 31 18:43:37 2024 +0900 selftests/tracing: Fix event filter test to retry up to 10 times [ Upstream commit 0f42bdf59b4e428485aa922bef871bfa6cc505e0 ] Commit eb50d0f250e9 ("selftests/ftrace: Choose target function for filter test from samples") choose the target function from samples, but sometimes this test failes randomly because the target function does not hit at the next time. So retry getting samples up to 10 times. Fixes: eb50d0f250e9 ("selftests/ftrace: Choose target function for filter test from samples") Signed-off-by: Masami Hiramatsu (Google) Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin commit b21cae4688490b4df104e214f9647d7ab472856b Author: NeilBrown Date: Tue May 28 13:27:17 2024 +1000 NFS: add barriers when testing for NFS_FSDATA_BLOCKED [ Upstream commit 99bc9f2eb3f79a2b4296d9bf43153e1d10ca50d3 ] dentry->d_fsdata is set to NFS_FSDATA_BLOCKED while unlinking or renaming-over a file to ensure that no open succeeds while the NFS operation progressed on the server. Setting dentry->d_fsdata to NFS_FSDATA_BLOCKED is done under ->d_lock after checking the refcount is not elevated. Any attempt to open the file (through that name) will go through lookp_open() which will take ->d_lock while incrementing the refcount, we can be sure that once the new value is set, __nfs_lookup_revalidate() *will* see the new value and will block. We don't have any locking guarantee that when we set ->d_fsdata to NULL, the wait_var_event() in __nfs_lookup_revalidate() will notice. wait/wake primitives do NOT provide barriers to guarantee order. We must use smp_load_acquire() in wait_var_event() to ensure we look at an up-to-date value, and must use smp_store_release() before wake_up_var(). This patch adds those barrier functions and factors out block_revalidate() and unblock_revalidate() far clarity. There is also a hypothetical bug in that if memory allocation fails (which never happens in practice) we might leave ->d_fsdata locked. This patch adds the missing call to unblock_revalidate(). Reported-and-tested-by: Richard Kojedzinszky Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501 Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit b1a6e884de15b3b4cb288d3aa23902cb847699a6 Author: Chen Hanxiao Date: Thu May 23 16:47:16 2024 +0800 SUNRPC: return proper error from gss_wrap_req_priv [ Upstream commit 33c94d7e3cb84f6d130678d6d59ba475a6c489cf ] don't return 0 if snd_buf->len really greater than snd_buf->buflen Signed-off-by: Chen Hanxiao Fixes: 0c77668ddb4e ("SUNRPC: Introduce trace points in rpc_auth_gss.ko") Reviewed-by: Benjamin Coddington Reviewed-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit c30988e06b155a8f09bbf6e6681f142471eec5a9 Author: Olga Kornievskaia Date: Wed May 29 15:44:35 2024 -0400 NFSv4.1 enforce rootpath check in fs_location query [ Upstream commit 28568c906c1bb5f7560e18082ed7d6295860f1c2 ] In commit 4ca9f31a2be66 ("NFSv4.1 test and add 4.1 trunking transport"), we introduce the ability to query the NFS server for possible trunking locations of the existing filesystem. However, we never checked the returned file system path for these alternative locations. According to the RFC, the server can say that the filesystem currently known under "fs_root" of fs_location also resides under these server locations under the following "rootpath" pathname. The client cannot handle trunking a filesystem that reside under different location under different paths other than what the main path is. This patch enforces the check that fs_root path and rootpath path in fs_location reply is the same. Fixes: 4ca9f31a2be6 ("NFSv4.1 test and add 4.1 trunking transport") Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit a9aa5a49c8edf895bd5c4becdd359f7fcb2ea0ea Author: Samuel Holland Date: Mon May 27 17:14:12 2024 -0700 clk: sifive: Do not register clkdevs for PRCI clocks [ Upstream commit 2607133196c35f31892ee199ce7ffa717bea4ad1 ] These clkdevs were unnecessary, because systems using this driver always look up clocks using the devicetree. And as Russell King points out[1], since the provided device name was truncated, lookups via clkdev would never match. Recently, commit 8d532528ff6a ("clkdev: report over-sized strings when creating clkdev entries") caused clkdev registration to fail due to the truncation, and this now prevents the driver from probing. Fix the driver by removing the clkdev registration. Link: https://lore.kernel.org/linux-clk/ZkfYqj+OcAxd9O2t@shell.armlinux.org.uk/ [1] Fixes: 30b8e27e3b58 ("clk: sifive: add a driver for the SiFive FU540 PRCI IP block") Fixes: 8d532528ff6a ("clkdev: report over-sized strings when creating clkdev entries") Reported-by: Guenter Roeck Closes: https://lore.kernel.org/linux-clk/7eda7621-0dde-4153-89e4-172e4c095d01@roeck-us.net/ Suggested-by: Russell King Signed-off-by: Samuel Holland Link: https://lore.kernel.org/r/20240528001432.1200403-1-samuel.holland@sifive.com Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit dff9b2238969497519923150fd9e2ad821209096 Author: Masami Hiramatsu (Google) Date: Tue May 21 09:00:22 2024 +0900 selftests/ftrace: Fix to check required event file [ Upstream commit f6c3c83db1d939ebdb8c8922748ae647d8126d91 ] The dynevent/test_duplicates.tc test case uses `syscalls/sys_enter_openat` event for defining eprobe on it. Since this `syscalls` events depend on CONFIG_FTRACE_SYSCALLS=y, if it is not set, the test will fail. Add the event file to `required` line so that the test will return `unsupported` result. Fixes: 297e1dcdca3d ("selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes") Signed-off-by: Masami Hiramatsu (Google) Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin commit 3bf0b8030296e9ee60d3d4c15849ad9ac0b47081 Author: Baokun Li Date: Wed May 22 19:43:07 2024 +0800 cachefiles: flush all requests after setting CACHEFILES_DEAD [ Upstream commit 85e833cd7243bda7285492b0653c3abb1e2e757b ] In ondemand mode, when the daemon is processing an open request, if the kernel flags the cache as CACHEFILES_DEAD, the cachefiles_daemon_write() will always return -EIO, so the daemon can't pass the copen to the kernel. Then the kernel process that is waiting for the copen triggers a hung_task. Since the DEAD state is irreversible, it can only be exited by closing /dev/cachefiles. Therefore, after calling cachefiles_io_error() to mark the cache as CACHEFILES_DEAD, if in ondemand mode, flush all requests to avoid the above hungtask. We may still be able to read some of the cached data before closing the fd of /dev/cachefiles. Note that this relies on the patch that adds reference counting to the req, otherwise it may UAF. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-12-libaokun@huaweicloud.com Acked-by: Jeff Layton Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin commit d2d3eb377a5d081bf2bed177d354a4f59b74da88 Author: Baokun Li Date: Wed May 22 19:43:05 2024 +0800 cachefiles: defer exposing anon_fd until after copy_to_user() succeeds [ Upstream commit 4b4391e77a6bf24cba2ef1590e113d9b73b11039 ] After installing the anonymous fd, we can now see it in userland and close it. However, at this point we may not have gotten the reference count of the cache, but we will put it during colse fd, so this may cause a cache UAF. So grab the cache reference count before fd_install(). In addition, by kernel convention, fd is taken over by the user land after fd_install(), and the kernel should not call close_fd() after that, i.e., it should call fd_install() after everything is ready, thus fd_install() is called after copy_to_user() succeeds. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Suggested-by: Hou Tao Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-10-libaokun@huaweicloud.com Acked-by: Jeff Layton Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin commit 527db1cb4cd66cde00054b2893700a3032cbfef3 Author: Baokun Li Date: Wed May 22 19:43:04 2024 +0800 cachefiles: never get a new anonymous fd if ondemand_id is valid [ Upstream commit 4988e35e95fc938bdde0e15880fe72042fc86acf ] Now every time the daemon reads an open request, it gets a new anonymous fd and ondemand_id. With the introduction of "restore", it is possible to read the same open request more than once, and therefore an object can have more than one anonymous fd. If the anonymous fd is not unique, the following concurrencies will result in an fd leak: t1 | t2 | t3 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd load->fd = fd0 ondemand_id = object_id0 ------ restore ------ cachefiles_ondemand_restore // restore REQ_A cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd load->fd = fd1 ondemand_id = object_id1 process_open_req(REQ_A) write(devfd, ("copen %u,%llu", msg->msg_id, size)) cachefiles_ondemand_copen xa_erase(&cache->reqs, id) complete(&REQ_A->done) kfree(REQ_A) process_open_req(REQ_A) // copen fails due to no req // daemon close(fd1) cachefiles_ondemand_fd_release // set object closed -- umount -- cachefiles_withdraw_cookie cachefiles_ondemand_clean_object cachefiles_ondemand_init_close_req if (!cachefiles_ondemand_object_is_open(object)) return -ENOENT; // The fd0 is not closed until the daemon exits. However, the anonymous fd holds the reference count of the object and the object holds the reference count of the cookie. So even though the cookie has been relinquished, it will not be unhashed and freed until the daemon exits. In fscache_hash_cookie(), when the same cookie is found in the hash list, if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new cookie waits for the old cookie to be unhashed, while the old cookie is waiting for the leaked fd to be closed, if the daemon does not exit in time it will trigger a hung task. To avoid this, allocate a new anonymous fd only if no anonymous fd has been allocated (ondemand_id == 0) or if the previously allocated anonymous fd has been closed (ondemand_id == -1). Moreover, returns an error if ondemand_id is valid, letting the daemon know that the current userland restore logic is abnormal and needs to be checked. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-9-libaokun@huaweicloud.com Acked-by: Jeff Layton Signed-off-by: Christian Brauner Stable-dep-of: 4b4391e77a6b ("cachefiles: defer exposing anon_fd until after copy_to_user() succeeds") Signed-off-by: Sasha Levin commit 1d95e5010ce85c51d2de2ed83d2bfdafe399a26d Author: Baokun Li Date: Wed May 22 19:43:01 2024 +0800 cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read() [ Upstream commit 3e6d704f02aa4c50c7bc5fe91a4401df249a137b ] The err_put_fd label is only used once, so remove it to make the code more readable. In addition, the logic for deleting error request and CLOSE request is merged to simplify the code. Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-6-libaokun@huaweicloud.com Acked-by: Jeff Layton Reviewed-by: Jia Zhu Reviewed-by: Gao Xiang Reviewed-by: Jingbo Xu Signed-off-by: Christian Brauner Stable-dep-of: 4b4391e77a6b ("cachefiles: defer exposing anon_fd until after copy_to_user() succeeds") Signed-off-by: Sasha Levin commit 3958679c49152391209b32be3357193300a51abd Author: Baokun Li Date: Wed May 22 19:43:00 2024 +0800 cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() [ Upstream commit da4a827416066191aafeeccee50a8836a826ba10 ] We got the following issue in a fuzz test of randomly issuing the restore command: ================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60 Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963 CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564 Call Trace: kasan_report+0x93/0xc0 cachefiles_ondemand_daemon_read+0xb41/0xb60 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0 Allocated by task 116: kmem_cache_alloc+0x140/0x3a0 cachefiles_lookup_cookie+0x140/0xcd0 fscache_cookie_state_machine+0x43c/0x1230 [...] Freed by task 792: kmem_cache_free+0xfe/0x390 cachefiles_put_object+0x241/0x480 fscache_cookie_state_machine+0x5c8/0x1230 [...] ================================================================== Following is the process that triggers the issue: mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_withdraw_cookie cachefiles_ondemand_clean_object(object) cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req msg->object_id = req->object->ondemand->ondemand_id ------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW) cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req copy_to_user(_buffer, msg, n) xa_erase(&cache->reqs, id) complete(&REQ_A->done) ------ close(fd) ------ cachefiles_ondemand_fd_release cachefiles_put_object cachefiles_put_object kmem_cache_free(cachefiles_object_jar, object) REQ_A->object->ondemand->ondemand_id // object UAF !!! When we see the request within xa_lock, req->object must not have been freed yet, so grab the reference count of object before xa_unlock to avoid the above issue. Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed") Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-5-libaokun@huaweicloud.com Acked-by: Jeff Layton Reviewed-by: Jia Zhu Reviewed-by: Jingbo Xu Signed-off-by: Christian Brauner Stable-dep-of: 4b4391e77a6b ("cachefiles: defer exposing anon_fd until after copy_to_user() succeeds") Signed-off-by: Sasha Levin commit a6de82765e12fb1201ab607f0d3ffe3309b30fc0 Author: Baokun Li Date: Wed May 22 19:42:59 2024 +0800 cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() [ Upstream commit de3e26f9e5b76fc628077578c001c4a51bf54d06 ] We got the following issue in a fuzz test of randomly issuing the restore command: ================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0 Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962 CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542 Call Trace: kasan_report+0x94/0xc0 cachefiles_ondemand_daemon_read+0x609/0xab0 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0 Allocated by task 626: __kmalloc+0x1df/0x4b0 cachefiles_ondemand_send_req+0x24d/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...] Freed by task 626: kfree+0xf1/0x2c0 cachefiles_ondemand_send_req+0x568/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...] ================================================================== Following is the process that triggers the issue: mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd copy_to_user(_buffer, msg, n) process_open_req(REQ_A) ------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW); cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req write(devfd, ("copen %u,%llu", msg->msg_id, size)); cachefiles_ondemand_copen xa_erase(&cache->reqs, id) complete(&REQ_A->done) kfree(REQ_A) cachefiles_ondemand_get_fd(REQ_A) fd = get_unused_fd_flags file = anon_inode_getfile fd_install(fd, file) load = (void *)REQ_A->msg.data; load->fd = fd; // load UAF !!! This issue is caused by issuing a restore command when the daemon is still alive, which results in a request being processed multiple times thus triggering a UAF. So to avoid this problem, add an additional reference count to cachefiles_req, which is held while waiting and reading, and then released when the waiting and reading is over. Note that since there is only one reference count for waiting, we need to avoid the same request being completed multiple times, so we can only complete the request if it is successfully removed from the xarray. Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests") Suggested-by: Hou Tao Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-4-libaokun@huaweicloud.com Acked-by: Jeff Layton Reviewed-by: Jia Zhu Reviewed-by: Jingbo Xu Signed-off-by: Christian Brauner Stable-dep-of: 4b4391e77a6b ("cachefiles: defer exposing anon_fd until after copy_to_user() succeeds") Signed-off-by: Sasha Levin commit 9f5fa40f0924e9de85b16c6d1aea80327ce647d8 Author: Jia Zhu Date: Mon Nov 20 12:14:22 2023 +0800 cachefiles: add restore command to recover inflight ondemand read requests [ Upstream commit e73fa11a356ca0905c3cc648eaacc6f0f2d2c8b3 ] Previously, in ondemand read scenario, if the anonymous fd was closed by user daemon, inflight and subsequent read requests would return EIO. As long as the device connection is not released, user daemon can hold and restore inflight requests by setting the request flag to CACHEFILES_REQ_NEW. Suggested-by: Gao Xiang Signed-off-by: Jia Zhu Signed-off-by: Xin Yin Link: https://lore.kernel.org/r/20231120041422.75170-6-zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu Reviewed-by: David Howells Signed-off-by: Christian Brauner Stable-dep-of: 4b4391e77a6b ("cachefiles: defer exposing anon_fd until after copy_to_user() succeeds") Signed-off-by: Sasha Levin commit e564e48ca299a5350e9f4182e29be8bef17856d6 Author: Baokun Li Date: Wed May 22 19:43:03 2024 +0800 cachefiles: add spin_lock for cachefiles_ondemand_info [ Upstream commit 0a790040838c736495d5afd6b2d636f159f817f1 ] The following concurrency may cause a read request to fail to be completed and result in a hung: t1 | t2 --------------------------------------------------------- cachefiles_ondemand_copen req = xa_erase(&cache->reqs, id) // Anon fd is maliciously closed. cachefiles_ondemand_fd_release xa_lock(&cache->reqs) cachefiles_ondemand_set_object_close(object) xa_unlock(&cache->reqs) cachefiles_ondemand_set_object_open // No one will ever close it again. cachefiles_ondemand_daemon_read cachefiles_ondemand_select_req // Get a read req but its fd is already closed. // The daemon can't issue a cread ioctl with an closed fd, then hung. So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and state, thus we can avoid the above problem in cachefiles_ondemand_copen() by using ondemand_id to determine if fd has been closed. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-8-libaokun@huaweicloud.com Acked-by: Jeff Layton Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin commit f740fd943bb1fbf79b7eaba3c71eb7536f437f51 Author: Jia Zhu Date: Mon Nov 20 12:14:20 2023 +0800 cachefiles: resend an open request if the read request's object is closed [ Upstream commit 0a7e54c1959c0feb2de23397ec09c7692364313e ] When an anonymous fd is closed by user daemon, if there is a new read request for this file comes up, the anonymous fd should be re-opened to handle that read request rather than fail it directly. 1. Introduce reopening state for objects that are closed but have inflight/subsequent read requests. 2. No longer flush READ requests but only CLOSE requests when anonymous fd is closed. 3. Enqueue the reopen work to workqueue, thus user daemon could get rid of daemon_read context and handle that request smoothly. Otherwise, the user daemon will send a reopen request and wait for itself to process the request. Signed-off-by: Jia Zhu Link: https://lore.kernel.org/r/20231120041422.75170-4-zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu Reviewed-by: David Howells Signed-off-by: Christian Brauner Stable-dep-of: 0a790040838c ("cachefiles: add spin_lock for cachefiles_ondemand_info") Signed-off-by: Sasha Levin commit 33d21f0658cf5ea7bd464f50f9670bfb08ae12f2 Author: Jia Zhu Date: Mon Nov 20 12:14:19 2023 +0800 cachefiles: extract ondemand info field from cachefiles_object [ Upstream commit 3c5ecfe16e7699011c12c2d44e55437415331fa3 ] We'll introduce a @work_struct field for @object in subsequent patches, it will enlarge the size of @object. As the result of that, this commit extracts ondemand info field from @object. Signed-off-by: Jia Zhu Link: https://lore.kernel.org/r/20231120041422.75170-3-zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu Reviewed-by: David Howells Signed-off-by: Christian Brauner Stable-dep-of: 0a790040838c ("cachefiles: add spin_lock for cachefiles_ondemand_info") Signed-off-by: Sasha Levin commit 955190e1851afb386309e6affcf1a127a9ea0204 Author: Jia Zhu Date: Mon Nov 20 12:14:18 2023 +0800 cachefiles: introduce object ondemand state [ Upstream commit 357a18d033143617e9c7d420c8f0dd4cbab5f34d ] Previously, @ondemand_id field was used not only to identify ondemand state of the object, but also to represent the index of the xarray. This commit introduces @state field to decouple the role of @ondemand_id and adds helpers to access it. Signed-off-by: Jia Zhu Link: https://lore.kernel.org/r/20231120041422.75170-2-zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu Reviewed-by: David Howells Signed-off-by: Christian Brauner Stable-dep-of: 0a790040838c ("cachefiles: add spin_lock for cachefiles_ondemand_info") Signed-off-by: Sasha Levin commit 50d0e55356ba5b84ffb51c42704126124257e598 Author: Baokun Li Date: Wed May 22 19:42:58 2024 +0800 cachefiles: remove requests from xarray during flushing requests [ Upstream commit 0fc75c5940fa634d84e64c93bfc388e1274ed013 ] Even with CACHEFILES_DEAD set, we can still read the requests, so in the following concurrency the request may be used after it has been freed: mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read // close dev fd cachefiles_flush_reqs complete(&REQ_A->done) kfree(REQ_A) xa_lock(&cache->reqs); cachefiles_ondemand_select_req req->msg.opcode != CACHEFILES_OP_READ // req use-after-free !!! xa_unlock(&cache->reqs); xa_destroy(&cache->reqs) Hence remove requests from cache->reqs when flushing them to avoid accessing freed requests. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-3-libaokun@huaweicloud.com Acked-by: Jeff Layton Reviewed-by: Jia Zhu Reviewed-by: Gao Xiang Reviewed-by: Jingbo Xu Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin commit 19133f53f1991432aab6fa9cdedad4b45568256a Author: Baokun Li Date: Wed May 22 19:42:57 2024 +0800 cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd [ Upstream commit cc5ac966f26193ab185cc43d64d9f1ae998ccb6e ] This lets us see the correct trace output. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li Link: https://lore.kernel.org/r/20240522114308.2402121-2-libaokun@huaweicloud.com Acked-by: Jeff Layton Reviewed-by: Jingbo Xu Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin commit d8316838aa0686da63a8be4194b7a17b0103ae4a Author: Li Zhijian Date: Tue May 7 13:34:21 2024 +0800 cxl/region: Fix memregion leaks in devm_cxl_add_region() [ Upstream commit 49ba7b515c4c0719b866d16f068e62d16a8a3dd1 ] Move the mode verification to __create_region() before allocating the memregion to avoid the memregion leaks. Fixes: 6e099264185d ("cxl/region: Add volatile region creation support") Signed-off-by: Li Zhijian Reviewed-by: Dan Williams Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin commit 09b4aa2815bf9f0f18c26de650db6abaaf751105 Author: Dave Jiang Date: Tue May 28 15:55:51 2024 -0700 cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c [ Upstream commit d55510527153d17a3af8cc2df69c04f95ae1350d ] tools/testing/cxl/test/mem.c uses vmalloc() and vfree() but does not include linux/vmalloc.h. Kernel v6.10 made changes that causes the currently included headers not depend on vmalloc.h and therefore mem.c can no longer compile. Add linux/vmalloc.h to fix compile issue. CC [M] tools/testing/cxl/test/mem.o tools/testing/cxl/test/mem.c: In function ‘label_area_release’: tools/testing/cxl/test/mem.c:1428:9: error: implicit declaration of function ‘vfree’; did you mean ‘kvfree’? [-Werror=implicit-function-declaration] 1428 | vfree(lsa); | ^~~~~ | kvfree tools/testing/cxl/test/mem.c: In function ‘cxl_mock_mem_probe’: tools/testing/cxl/test/mem.c:1466:22: error: implicit declaration of function ‘vmalloc’; did you mean ‘kmalloc’? [-Werror=implicit-function-declaration] 1466 | mdata->lsa = vmalloc(LSA_SIZE); | ^~~~~~~ | kmalloc Fixes: 7d3eb23c4ccf ("tools/testing/cxl: Introduce a mock memory device + driver") Reviewed-by: Dan Williams Reviewed-by: Alison Schofield Link: https://lore.kernel.org/r/20240528225551.1025977-1-dave.jiang@intel.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin commit b3f206985a33fae523e80a0342087bb160daea42 Author: Chen Ni Date: Wed May 15 11:30:51 2024 +0800 HID: nvidia-shield: Add missing check for input_ff_create_memless [ Upstream commit 0a3f9f7fc59feb8a91a2793b8b60977895c72365 ] Add check for the return value of input_ff_create_memless() and return the error if it fails in order to catch the error. Fixes: 09308562d4af ("HID: nvidia-shield: Initial driver implementation with Thunderstrike support") Signed-off-by: Chen Ni Reviewed-by: Rahul Rameshbabu Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin commit af4cff0dd6404d64f7f09d52bf6b1bb6545f0ab1 Author: Michael Ellerman Date: Wed May 29 22:30:28 2024 +1000 powerpc/uaccess: Fix build errors seen with GCC 13/14 commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream. Building ppc64le_defconfig with GCC 14 fails with assembler errors: CC fs/readdir.o /tmp/ccdQn0mD.s: Assembler messages: /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4) /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4) ... [6 lines] /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4) A snippet of the asm shows: # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end); ld 9,0(29) # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 *)name_38(D) + _88 * 1] # 210 "../fs/readdir.c" 1 1: std 9,18(8) # put_user # *__pus_addr_52, MEM[(u64 *)name_38(D) + _88 * 1] The 'std' instruction requires a 4-byte aligned displacement because it is a DS-form instruction, and as the assembler says, 18 is not a multiple of 4. A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y. The fix is to change the constraint on the memory operand to put_user(), from "m" which is a general memory reference to "YZ". The "Z" constraint is documented in the GCC manual PowerPC machine constraints, and specifies a "memory operand accessed with indexed or indirect addressing". "Y" is not documented in the manual but specifies a "memory operand for a DS-form instruction". Using both allows the compiler to generate a DS-form "std" or X-form "stdx" as appropriate. The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because the "Y" constraint does not guarantee 4-byte alignment when prefixed instructions are enabled. Unfortunately clang doesn't support the "Y" constraint so that has to be behind an ifdef. Although the build error is only seen with GCC 13/14, that appears to just be luck. The constraint has been incorrect since it was first added. Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Cc: stable@vger.kernel.org # v5.10+ Suggested-by: Kewen Lin Signed-off-by: Michael Ellerman Link: https://msgid.link/20240529123029.146953-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman commit 2ce5341c36993b776012601921d7688693f8c037 Author: Ziwei Xiao Date: Wed Jun 12 00:16:54 2024 +0000 gve: Clear napi->skb before dev_kfree_skb_any() commit 6f4d93b78ade0a4c2cafd587f7b429ce95abb02e upstream. gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it is freed with dev_kfree_skb_any(). This can result in a subsequent call to napi_get_frags returning a dangling pointer. Fix this by clearing napi->skb before the skb is freed. Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Cc: stable@vger.kernel.org Reported-by: Shailend Chand Signed-off-by: Ziwei Xiao Reviewed-by: Harshitha Ramamurthy Reviewed-by: Shailend Chand Reviewed-by: Praveen Kaligineedi Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 323d2563bde0ab5819e17bf2a44f6ba847e97edb Author: Martin K. Petersen Date: Tue Jun 4 22:25:21 2024 -0400 scsi: sd: Use READ(16) when reading block zero on large capacity disks commit 7926d51f73e0434a6250c2fd1a0555f98d9a62da upstream. Commit 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") triggered a read to LBA 0 before attempting to inquire about device characteristics. This was done because some protocol bridge devices will return generic values until an attached storage device's media has been accessed. Pierre Tomon reported that this change caused problems on a large capacity external drive connected via a bridge device. The bridge in question does not appear to implement the READ(10) command. Issue a READ(16) instead of READ(10) when a device has been identified as preferring 16-byte commands (use_16_for_rw heuristic). Link: https://bugzilla.kernel.org/show_bug.cgi?id=218890 Link: https://lore.kernel.org/r/70dd7ae0-b6b1-48e1-bb59-53b7c7f18274@rowland.harvard.edu Link: https://lore.kernel.org/r/20240605022521.3960956-1-martin.petersen@oracle.com Fixes: 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") Cc: stable@vger.kernel.org Reported-by: Pierre Tomon Suggested-by: Alan Stern Tested-by: Pierre Tomon Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 9079338c5a0d1f1fee34fb1c9e99b754efe414c5 Author: Breno Leitao Date: Wed Jun 5 01:55:29 2024 -0700 scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory commit 4254dfeda82f20844299dca6c38cbffcfd499f41 upstream. There is a potential out-of-bounds access when using test_bit() on a single word. The test_bit() and set_bit() functions operate on long values, and when testing or setting a single word, they can exceed the word boundary. KASAN detects this issue and produces a dump: BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965 For full log, please look at [1]. Make the allocation at least the size of sizeof(unsigned long) so that set_bit() and test_bit() have sufficient room for read/write operations without overwriting unallocated memory. [1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/ Fixes: c696f7b83ede ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path") Cc: stable@vger.kernel.org Suggested-by: Keith Busch Signed-off-by: Breno Leitao Link: https://lore.kernel.org/r/20240605085530.499432-1-leitao@debian.org Reviewed-by: Keith Busch Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit da097dccaece0f865359ac37879eabfe5aad21e7 Author: Damien Le Moal Date: Tue Jun 11 17:34:35 2024 +0900 scsi: mpi3mr: Fix ATA NCQ priority support commit 90e6f08915ec6efe46570420412a65050ec826b2 upstream. The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to the HBA if a read or write command directed at an ATA device should be translated to an NCQ read/write command with the high prioiryt bit set when the request uses the RT priority class and the user has enabled NCQ priority through sysfs. However, unlike the mpt3sas driver, the mpi3mr driver does not define the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never actually set and NCQ Priority cannot ever be used. Fix this by defining these missing atributes to allow a user to check if an ATA device supports NCQ priority and to enable/disable the use of NCQ priority. To do this, lift the function scsih_ncq_prio_supp() out of the mpt3sas driver and make it the generic SCSI SAS transport function sas_ata_ncq_prio_supported(). Nothing in that function is hardware specific, so this function can be used in both the mpt3sas driver and the mpi3mr driver. Reported-by: Scott McCoy Fixes: 023ab2a9b4ed ("scsi: mpi3mr: Add support for queue command processing") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Link: https://lore.kernel.org/r/20240611083435.92961-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 3e9785d3e92b59463814c06ef17f30862a076346 Author: Damien Le Moal Date: Fri Jun 7 10:25:07 2024 +0900 scsi: core: Disable CDL by default commit 52912ca87e2b810e5acdcdc452593d30c9187d8f upstream. For SCSI devices supporting the Command Duration Limits feature set, the user can enable/disable this feature use through the sysfs device attribute "cdl_enable". This attribute modification triggers a call to scsi_cdl_enable() to enable and disable the feature for ATA devices and set the scsi device cdl_enable field to the user provided bool value. For SCSI devices supporting CDL, the feature set is always enabled and scsi_cdl_enable() is reduced to setting the cdl_enable field. However, for ATA devices, a drive may spin-up with the CDL feature enabled by default. But the SCSI device cdl_enable field is always initialized to false (CDL disabled), regardless of the actual device CDL feature state. For ATA devices managed by libata (or libsas), libata-core always disables the CDL feature set when the device is attached, thus syncing the state of the CDL feature on the device and of the SCSI device cdl_enable field. However, for ATA devices connected to a SAS HBA, the CDL feature is not disabled on scan for ATA devices that have this feature enabled by default, leading to an inconsistent state of the feature on the device with the SCSI device cdl_enable field. Avoid this inconsistency by adding a call to scsi_cdl_enable() in scsi_cdl_check() to make sure that the device-side state of the CDL feature set always matches the scsi device cdl_enable field state. This implies that CDL will always be disabled for ATA devices connected to SAS HBAs, which is consistent with libata/libsas initialization of the device. Reported-by: Scott McCoy Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Link: https://lore.kernel.org/r/20240607012507.111488-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel Reviewed-by: Igor Pylypiv Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit d5ceeb0b6a716754f4aa47cff3ed9da0487d8ca7 Author: Aapo Vienamo Date: Fri May 24 18:53:17 2024 +0300 thunderbolt: debugfs: Fix margin debugfs node creation condition commit 985cfe501b74f214905ab4817acee0df24627268 upstream. The margin debugfs node controls the "Enable Margin Test" field of the lane margining operations. This field selects between either low or high voltage margin values for voltage margin test or left or right timing margin values for timing margin test. According to the USB4 specification, whether or not the "Enable Margin Test" control applies, depends on the values of the "Independent High/Low Voltage Margin" or "Independent Left/Right Timing Margin" capability fields for voltage and timing margin tests respectively. The pre-existing condition enabled the debugfs node also in the case where both low/high or left/right margins are returned, which is incorrect. This change only enables the debugfs node in question, if the specific required capability values are met. Signed-off-by: Aapo Vienamo Fixes: d0f1e0c2a699 ("thunderbolt: Add support for receiver lane margining") Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg Signed-off-by: Greg Kroah-Hartman commit d4121290b42703039f27bd68c8ca80c854b44261 Author: Kuangyi Chiang Date: Tue Jun 11 15:06:09 2024 +0300 xhci: Apply broken streams quirk to Etron EJ188 xHCI host commit 91f7a1524a92c70ffe264db8bdfa075f15bbbeb9 upstream. As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188 as well. Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 949be4ec5835e0ccb3e2a8ab0e46179cb5512518 Author: Hector Martin Date: Tue Jun 11 15:06:10 2024 +0300 xhci: Handle TD clearing for multiple streams case commit 5ceac4402f5d975e5a01c806438eb4e554771577 upstream. When multiple streams are in use, multiple TDs might be in flight when an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for each, to ensure everything is reset properly and the caches cleared. Change the logic so that any N>1 TDs found active for different streams are deferred until after the first one is processed, calling xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to queue another command until we are done with all of them. Also change the error/"should never happen" paths to ensure we at least clear any affected TDs, even if we can't issue a command to clear the hardware cache, and complain loudly with an xhci_warn() if this ever happens. This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") early on in the XHCI driver's life, when stream support was first added. It was then identified but not fixed nor made into a warning in commit 674f8438c121 ("xhci: split handling halted endpoints into two steps"), which added a FIXME comment for the problem case (without materially changing the behavior as far as I can tell, though the new logic made the problem more obvious). Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs."), it was acknowledged again. [Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") was a targeted regression fix to the previously mentioned patch. Users reported issues with usb stuck after unmounting/disconnecting UAS devices. This rolled back the TD clearing of multiple streams to its original state.] Apparently the commit author was aware of the problem (yet still chose to submit it): It was still mentioned as a FIXME, an xhci_dbg() was added to log the problem condition, and the remaining issue was mentioned in the commit description. The choice of making the log type xhci_dbg() for what is, at this point, a completely unhandled and known broken condition is puzzling and unfortunate, as it guarantees that no actual users would see the log in production, thereby making it nigh undebuggable (indeed, even if you turn on DEBUG, the message doesn't really hint at there being a problem at all). It took me *months* of random xHC crashes to finally find a reliable repro and be able to do a deep dive debug session, which could all have been avoided had this unhandled, broken condition been actually reported with a warning, as it should have been as a bug intentionally left in unfixed (never mind that it shouldn't have been left in at all). > Another fix to solve clearing the caches of all stream rings with > cancelled TDs is needed, but not as urgent. 3 years after that statement and 14 years after the original bug was introduced, I think it's finally time to fix it. And maybe next time let's not leave bugs unfixed (that are actually worse than the original bug), and let's actually get people to review kernel commits please. Fixes xHC crashes and IOMMU faults with UAS devices when handling errors/faults. Easiest repro is to use `hdparm` to mark an early sector (e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop. At least in the case of JMicron controllers, the read errors end up having to cancel two TDs (for two queued requests to different streams) and the one that didn't get cleared properly ends up faulting the xHC entirely when it tries to access DMA pages that have since been unmapped, referred to by the stale TDs. This normally happens quickly (after two or three loops). After this fix, I left the `cat` in a loop running overnight and experienced no xHC failures, with all read errors recovered properly. Repro'd and tested on an Apple M1 Mac Mini (dwc3 host). On systems without an IOMMU, this bug would instead silently corrupt freed memory, making this a security bug (even on systems with IOMMUs this could silently corrupt memory belonging to other USB devices on the same controller, so it's still a security bug). Given that the kernel autoprobes partition tables, I'm pretty sure a malicious USB device pretending to be a UAS device and reporting an error with the right timing could deliberately trigger a UAF and write to freed memory, with no user action. [Mathias: Commit message and code comment edit, original at:] https://lore.kernel.org/linux-usb/20240524-xhci-streams-v1-1-6b1f13819bea@marcan.st/ Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps") Cc: stable@vger.kernel.org Cc: security@kernel.org Reviewed-by: Neal Gompa Signed-off-by: Hector Martin Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 0a834fb6dbd8dcd8f04fbd43b598e3bd3bd807af Author: Kuangyi Chiang Date: Tue Jun 11 15:06:08 2024 +0300 xhci: Apply reset resume quirk to Etron EJ188 xHCI host commit 17bd54555c2aaecfdb38e2734149f684a73fa584 upstream. As described in commit c877b3b2ad5c ("xhci: Add reset on resume quirk for asrock p67 host"), EJ188 have the same issue as EJ168, where completely dies on resume. So apply XHCI_RESET_ON_RESUME quirk to EJ188 as well. Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 834c57876cc2b5453a0ca7092d50e99ff3aeb74e Author: Mathias Nyman Date: Tue Jun 11 15:06:07 2024 +0300 xhci: Set correct transferred length for cancelled bulk transfers commit f0260589b439e2637ad54a2b25f00a516ef28a57 upstream. The transferred length is set incorrectly for cancelled bulk transfer TDs in case the bulk transfer ring stops on the last transfer block with a 'Stop - Length Invalid' completion code. length essentially ends up being set to the requested length: urb->actual_length = urb->transfer_buffer_length Length for 'Stop - Length Invalid' cases should be the sum of all TRB transfer block lengths up to the one the ring stopped on, _excluding_ the one stopped on. Fix this by always summing up TRB lengths for 'Stop - Length Invalid' bulk cases. This issue was discovered by Alan Stern while debugging https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not solve that bug. Issue is older than 4.10 kernel but fix won't apply to those due to major reworks in that area. Tested-by: Pierre Tomon Cc: stable@vger.kernel.org # v4.10+ Cc: Alan Stern Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 4598233d9748fe4db4e13b9f473588aa25e87d69 Author: Greg Kroah-Hartman Date: Tue May 14 12:06:34 2024 +0200 jfs: xattr: fix buffer overflow for invalid xattr commit 7c55b78818cfb732680c4a72ab270cc2d2ee3d0f upstream. When an xattr size is not what is expected, it is printed out to the kernel log in hex format as a form of debugging. But when that xattr size is bigger than the expected size, printing it out can cause an access off the end of the buffer. Fix this all up by properly restricting the size of the debug hex dump in the kernel log. Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com Cc: Dave Kleikamp Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh Signed-off-by: Greg Kroah-Hartman commit cc30d05b34f9a087a6928d09b131f7b491e9ab11 Author: Mickaël Salaün Date: Thu May 16 20:19:34 2024 +0200 landlock: Fix d_parent walk commit 88da52ccd66e65f2e63a6c35c9dff55d448ef4dc upstream. The WARN_ON_ONCE() in collect_domain_accesses() can be triggered when trying to link a root mount point. This cannot work in practice because this directory is mounted, but the VFS check is done after the call to security_path_link(). Do not use source directory's d_parent when the source directory is the mount point. Cc: Günther Noack Cc: Paul Moore Cc: stable@vger.kernel.org Reported-by: syzbot+bf4903dc7e12b18ebc87@syzkaller.appspotmail.com Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Closes: https://lore.kernel.org/r/000000000000553d3f0618198200@google.com Link: https://lore.kernel.org/r/20240516181935.1645983-2-mic@digikod.net [mic: Fix commit message] Signed-off-by: Mickaël Salaün Signed-off-by: Greg Kroah-Hartman commit 3380fa014a89e4f6c0e6dc23bba74a063f0ed30c Author: Douglas Anderson Date: Fri May 31 08:09:18 2024 -0700 serial: port: Don't block system suspend even if bytes are left to xmit commit ca84cd379b45e9b1775b9e026f069a3a886b409d upstream. Recently, suspend testing on sc7180-trogdor based devices has started to sometimes fail with messages like this: port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0 port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16 port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs port a88000.serial:0.0: PM: failed to suspend: error -16 I could reproduce these problems by logging in via an agetty on the debug serial port (which was _not_ used for kernel console) and running: cat /var/log/messages ...and then (via an SSH session) forcing a few suspend/resume cycles. Tracing through the code and doing some printf()-based debugging shows that the -16 (-EBUSY) comes from the recently added serial_port_runtime_suspend(). The idea of the serial_port_runtime_suspend() function is to prevent the port from being _runtime_ suspended if it still has bytes left to transmit. Having bytes left to transmit isn't a reason to block _system_ suspend, though. If a serdev device in the kernel needs to block system suspend it should block its own suspend and it can use serdev_device_wait_until_sent() to ensure bytes are sent. The DEFINE_RUNTIME_DEV_PM_OPS() used by the serial_port code means that the system suspend function will be pm_runtime_force_suspend(). In pm_runtime_force_suspend() we can see that before calling the runtime suspend function we'll call pm_runtime_disable(). This should be a reliable way to detect that we're called from system suspend and that we shouldn't look for busyness. Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") Cc: stable@vger.kernel.org Reviewed-by: Tony Lindgren Signed-off-by: Douglas Anderson Link: https://lore.kernel.org/r/20240531080914.v3.1.I2395e66cf70c6e67d774c56943825c289b9c13e4@changeid Signed-off-by: Greg Kroah-Hartman commit b895a1b981cf529e869490ee8578723a26a8c550 Author: Ilpo Järvinen Date: Tue May 14 17:04:29 2024 +0300 tty: n_tty: Fix buffer offsets when lookahead is used commit b19ab7ee2c4c1ec5f27c18413c3ab63907f7d55c upstream. When lookahead has "consumed" some characters (la_count > 0), n_tty_receive_buf_standard() and n_tty_receive_buf_closing() for characters beyond the la_count are given wrong cp/fp offsets which leads to duplicating and losing some characters. If la_count > 0, correct buffer pointers and make count consistent too (the latter is not strictly necessary to fix the issue but seems more logical to adjust all variables immediately to keep state consistent). Reported-by: Vadym Krevs Fixes: 6bb6fa6908eb ("tty: Implement lookahead to process XON/XOFF timely") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218834 Tested-by: Vadym Krevs Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20240514140429.12087-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit ce356d8d7e912be9d198ea41af68e147b2d6618f Author: Tomas Winkler Date: Tue Jun 4 12:07:28 2024 +0300 mei: me: release irq in mei_me_pci_resume error path commit 283cb234ef95d94c61f59e1cd070cd9499b51292 upstream. The mei_me_pci_resume doesn't release irq on the error path, in case mei_start() fails. Cc: Fixes: 33ec08263147 ("mei: revamp mei reset state machine") Signed-off-by: Tomas Winkler Link: https://lore.kernel.org/r/20240604090728.1027307-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman commit ad47b23e470450cbbb5e221cb52dc951940cf8cc Author: Kyle Tso Date: Mon May 20 23:48:58 2024 +0800 usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state commit fc8fb9eea94d8f476e15f3a4a7addeb16b3b99d6 upstream. Similar to what fixed in Commit a6fe37f428c1 ("usb: typec: tcpm: Skip hard reset when in error recovery"), the handling of the received Hard Reset has to be skipped during TOGGLING state. [ 4086.021288] VBUS off [ 4086.021295] pending state change SNK_READY -> SNK_UNATTACHED @ 650 ms [rev2 NONE_AMS] [ 4086.022113] VBUS VSAFE0V [ 4086.022117] state change SNK_READY -> SNK_UNATTACHED [rev2 NONE_AMS] [ 4086.022447] VBUS off [ 4086.022450] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS] [ 4086.023060] VBUS VSAFE0V [ 4086.023064] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS] [ 4086.023070] disable BIST MODE TESTDATA [ 4086.023766] disable vbus discharge ret:0 [ 4086.023911] Setting usb_comm capable false [ 4086.028874] Setting voltage/current limit 0 mV 0 mA [ 4086.028888] polarity 0 [ 4086.030305] Requesting mux state 0, usb-role 0, orientation 0 [ 4086.033539] Start toggling [ 4086.038496] state change SNK_UNATTACHED -> TOGGLING [rev2 NONE_AMS] // This Hard Reset is unexpected [ 4086.038499] Received hard reset [ 4086.038501] state change TOGGLING -> HARD_RESET_START [rev2 HARD_RESET] Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)") Cc: stable@vger.kernel.org Signed-off-by: Kyle Tso Reviewed-by: Heikki Krogerus Link: https://lore.kernel.org/r/20240520154858.1072347-1-kyletso@google.com Signed-off-by: Greg Kroah-Hartman commit 04c05d50fa79a41582f7bde8a1fd4377ae4a39e5 Author: Amit Sunil Dhamne Date: Tue May 14 15:01:31 2024 -0700 usb: typec: tcpm: fix use-after-free case in tcpm_register_source_caps commit e7e921918d905544500ca7a95889f898121ba886 upstream. There could be a potential use-after-free case in tcpm_register_source_caps(). This could happen when: * new (say invalid) source caps are advertised * the existing source caps are unregistered * tcpm_register_source_caps() returns with an error as usb_power_delivery_register_capabilities() fails This causes port->partner_source_caps to hold on to the now freed source caps. Reset port->partner_source_caps value to NULL after unregistering existing source caps. Fixes: 230ecdf71a64 ("usb: typec: tcpm: unregister existing source caps before re-registration") Cc: stable@vger.kernel.org Signed-off-by: Amit Sunil Dhamne Reviewed-by: Ondrej Jirman Reviewed-by: Heikki Krogerus Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20240514220134.2143181-1-amitsd@google.com Signed-off-by: Greg Kroah-Hartman commit b641889cc1cfd83abad416201f747955f5b79690 Author: John Ernberg Date: Fri May 17 11:43:52 2024 +0000 USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected commit 8475ffcfb381a77075562207ce08552414a80326 upstream. If no other USB HCDs are selected when compiling a small pure virutal machine, the Xen HCD driver cannot be built. Fix it by traversing down host/ if CONFIG_USB_XEN_HCD is selected. Fixes: 494ed3997d75 ("usb: Introduce Xen pvUSB frontend (xen hcd)") Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: John Ernberg Link: https://lore.kernel.org/r/20240517114345.1190755-1-john.ernberg@actia.se Signed-off-by: Greg Kroah-Hartman commit 72a3fe36cf9f0d030865e571f45a40f9c1e07e8a Author: Alan Stern Date: Thu Jun 13 21:30:43 2024 -0400 USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages commit 22f00812862564b314784167a89f27b444f82a46 upstream. The syzbot fuzzer found that the interrupt-URB completion callback in the cdc-wdm driver was taking too long, and the driver's immediate resubmission of interrupt URBs with -EPROTO status combined with the dummy-hcd emulation to cause a CPU lockup: cdc_wdm 1-1:1.0: nonzero urb status received: -71 cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625] CPU#0 Utilization every 4s during lockup: #1: 98% system, 0% softirq, 3% hardirq, 0% idle #2: 98% system, 0% softirq, 3% hardirq, 0% idle #3: 98% system, 0% softirq, 3% hardirq, 0% idle #4: 98% system, 0% softirq, 3% hardirq, 0% idle #5: 98% system, 1% softirq, 3% hardirq, 0% idle Modules linked in: irq event stamp: 73096 hardirqs last enabled at (73095): [] console_emit_next_record kernel/printk/printk.c:2935 [inline] hardirqs last enabled at (73095): [] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994 hardirqs last disabled at (73096): [] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (73096): [] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (73048): [] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (73048): [] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582 softirqs last disabled at (73043): [] __do_softirq+0x14/0x20 kernel/softirq.c:588 CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G W 6.10.0-rc2-syzkaller-g8867bbd4a056 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Testing showed that the problem did not occur if the two error messages -- the first two lines above -- were removed; apparently adding material to the kernel log takes a surprisingly large amount of time. In any case, the best approach for preventing these lockups and to avoid spamming the log with thousands of error messages per second is to ratelimit the two dev_err() calls. Therefore we replace them with dev_err_ratelimited(). Signed-off-by: Alan Stern Suggested-by: Greg KH Reported-and-tested-by: syzbot+5f996b83575ef4058638@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/00000000000073d54b061a6a1c65@google.com/ Reported-and-tested-by: syzbot+1b2abad17596ad03dcff@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000f45085061aa9b37e@google.com/ Fixes: 9908a32e94de ("USB: remove err() macro from usb class drivers") Link: https://lore.kernel.org/linux-usb/40dfa45b-5f21-4eef-a8c1-51a2f320e267@rowland.harvard.edu/ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/29855215-52f5-4385-b058-91f42c2bee18@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman commit 43cfac7b88adedfb26c27834386992650f1642f3 Author: Jens Axboe Date: Sat Jun 1 12:25:35 2024 -0600 io_uring: check for non-NULL file pointer in io_file_can_poll() commit 5fc16fa5f13b3c06fdb959ef262050bd810416a2 upstream. In earlier kernels, it was possible to trigger a NULL pointer dereference off the forced async preparation path, if no file had been assigned. The trace leading to that looks as follows: BUG: kernel NULL pointer dereference, address: 00000000000000b0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 67 PID: 1633 Comm: buf-ring-invali Not tainted 6.8.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022 RIP: 0010:io_buffer_select+0xc3/0x210 Code: 00 00 48 39 d1 0f 82 ae 00 00 00 48 81 4b 48 00 00 01 00 48 89 73 70 0f b7 50 0c 66 89 53 42 85 ed 0f 85 d2 00 00 00 48 8b 13 <48> 8b 92 b0 00 00 00 48 83 7a 40 00 0f 84 21 01 00 00 4c 8b 20 5b RSP: 0018:ffffb7bec38c7d88 EFLAGS: 00010246 RAX: ffff97af2be61000 RBX: ffff97af234f1700 RCX: 0000000000000040 RDX: 0000000000000000 RSI: ffff97aecfb04820 RDI: ffff97af234f1700 RBP: 0000000000000000 R08: 0000000000200030 R09: 0000000000000020 R10: ffffb7bec38c7dc8 R11: 000000000000c000 R12: ffffb7bec38c7db8 R13: ffff97aecfb05800 R14: ffff97aecfb05800 R15: ffff97af2be5e000 FS: 00007f852f74b740(0000) GS:ffff97b1eeec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b0 CR3: 000000016deab005 CR4: 0000000000370ef0 Call Trace: ? __die+0x1f/0x60 ? page_fault_oops+0x14d/0x420 ? do_user_addr_fault+0x61/0x6a0 ? exc_page_fault+0x6c/0x150 ? asm_exc_page_fault+0x22/0x30 ? io_buffer_select+0xc3/0x210 __io_import_iovec+0xb5/0x120 io_readv_prep_async+0x36/0x70 io_queue_sqe_fallback+0x20/0x260 io_submit_sqes+0x314/0x630 __do_sys_io_uring_enter+0x339/0xbc0 ? __do_sys_io_uring_register+0x11b/0xc50 ? vm_mmap_pgoff+0xce/0x160 do_syscall_64+0x5f/0x180 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x55e0a110a67e Code: ba cc 00 00 00 45 31 c0 44 0f b6 92 d0 00 00 00 31 d2 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 90 89 30 eb a9 0f 1f 40 00 48 8b 42 20 8b 00 a8 06 75 af 85 f6 because the request is marked forced ASYNC and has a bad file fd, and hence takes the forced async prep path. Current kernels with the request async prep cleaned up can no longer hit this issue, but for ease of backporting, let's add this safety check in here too as it really doesn't hurt. For both cases, this will inevitably end with a CQE posted with -EBADF. Cc: stable@vger.kernel.org Fixes: a76c0b31eef5 ("io_uring: commit non-pollable provided mapped buffers upfront") Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 0c9df3df0c888d9ec8d11a68474a4aa04d371cff Author: Pavel Begunkov Date: Wed Jun 12 13:56:38 2024 +0100 io_uring/rsrc: don't lock while !TASK_RUNNING commit 54559642b96116b45e4b5ca7fd9f7835b8561272 upstream. There is a report of io_rsrc_ref_quiesce() locking a mutex while not TASK_RUNNING, which is due to forgetting restoring the state back after io_run_task_work_sig() and attempts to break out of the waiting loop. do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait+0xa4/0x380 kernel/sched/wait.c:237 WARNING: CPU: 2 PID: 397056 at kernel/sched/core.c:10099 __might_sleep+0x114/0x160 kernel/sched/core.c:10099 RIP: 0010:__might_sleep+0x114/0x160 kernel/sched/core.c:10099 Call Trace: __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0xb4/0x940 kernel/locking/mutex.c:752 io_rsrc_ref_quiesce+0x590/0x940 io_uring/rsrc.c:253 io_sqe_buffers_unregister+0xa2/0x340 io_uring/rsrc.c:799 __io_uring_register io_uring/register.c:424 [inline] __do_sys_io_uring_register+0x5b9/0x2400 io_uring/register.c:613 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd8/0x270 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 Reported-by: Li Shi Fixes: 4ea15b56f0810 ("io_uring/rsrc: use wq for quiescing") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/77966bc104e25b0534995d5dbb152332bc8f31c0.1718196953.git.asml.silence@gmail.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit d18b05eda7fa77f02114f15b02c009f28ee42346 Author: Ryusuke Konishi Date: Tue Jun 4 22:42:55 2024 +0900 nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors [ Upstream commit 7373a51e7998b508af7136530f3a997b286ce81c ] The error handling in nilfs_empty_dir() when a directory folio/page read fails is incorrect, as in the old ext2 implementation, and if the folio/page cannot be read or nilfs_check_folio() fails, it will falsely determine the directory as empty and corrupt the file system. In addition, since nilfs_empty_dir() does not immediately return on a failed folio/page read, but continues to loop, this can cause a long loop with I/O if i_size of the directory's inode is also corrupted, causing the log writer thread to wait and hang, as reported by syzbot. Fix these issues by making nilfs_empty_dir() immediately return a false value (0) if it fails to get a directory folio/page. Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin commit 8394dce135733329c143097351e1893ade6a69cd Author: Matthew Wilcox (Oracle) Date: Mon Nov 27 23:30:25 2023 +0900 nilfs2: return the mapped address from nilfs_get_page() [ Upstream commit 09a46acb3697e50548bb265afa1d79163659dd85 ] In prepartion for switching from kmap() to kmap_local(), return the kmap address from nilfs_get_page() instead of having the caller look up page_address(). [konishi.ryusuke: fixed a missing blank line after declaration] Link: https://lkml.kernel.org/r/20231127143036.2425-7-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Ryusuke Konishi Signed-off-by: Andrew Morton Stable-dep-of: 7373a51e7998 ("nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors") Signed-off-by: Sasha Levin commit 39a143a2b072f239ea67e793731fd2b90301f4b7 Author: Andrii Nakryiko Date: Tue May 21 09:33:57 2024 -0700 bpf: fix multi-uprobe PID filtering logic [ Upstream commit 46ba0e49b64232adac35a2bc892f1710c5b0fb7f ] Current implementation of PID filtering logic for multi-uprobes in uprobe_prog_run() is filtering down to exact *thread*, while the intent for PID filtering it to filter by *process* instead. The check in uprobe_prog_run() also differs from the analogous one in uprobe_multi_link_filter() for some reason. The latter is correct, checking task->mm, not the task itself. Fix the check in uprobe_prog_run() to perform the same task->mm check. While doing this, we also update get_pid_task() use to use PIDTYPE_TGID type of lookup, given the intent is to get a representative task of an entire process. This doesn't change behavior, but seems more logical. It would hold task group leader task now, not any random thread task. Last but not least, given multi-uprobe support is half-broken due to this PID filtering logic (depending on whether PID filtering is important or not), we need to make it easy for user space consumers (including libbpf) to easily detect whether PID filtering logic was already fixed. We do it here by adding an early check on passed pid parameter. If it's negative (and so has no chance of being a valid PID), we return -EINVAL. Previous behavior would eventually return -ESRCH ("No process found"), given there can't be any process with negative PID. This subtle change won't make any practical change in behavior, but will allow applications to detect PID filtering fixes easily. Libbpf fixes take advantage of this in the next patch. Cc: stable@vger.kernel.org Acked-by: Jiri Olsa Fixes: b733eeade420 ("bpf: Add pid filter support for uprobe_multi link") Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin commit 7ec535ed8724d18ae4e714d2277a5b89450659d2 Author: Steven Rostedt (Google) Date: Thu May 23 01:14:28 2024 -0400 eventfs: Update all the eventfs_inodes from the events descriptor [ Upstream commit 340f0c7067a95281ad13734f8225f49c6cf52067 ] The change to update the permissions of the eventfs_inode had the misconception that using the tracefs_inode would find all the eventfs_inodes that have been updated and reset them on remount. The problem with this approach is that the eventfs_inodes are freed when they are no longer used (basically the reason the eventfs system exists). When they are freed, the updated eventfs_inodes are not reset on a remount because their tracefs_inodes have been freed. Instead, since the events directory eventfs_inode always has a tracefs_inode pointing to it (it is not freed when finished), and the events directory has a link to all its children, have the eventfs_remount() function only operate on the events eventfs_inode and have it descend into its children updating their uid and gids. Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVwJyVQ@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Reported-by: Masahiro Yamada Signed-off-by: Steven Rostedt (Google) Signed-off-by: Sasha Levin commit 1c88d94a7a336d1caf568be54ee408da0c572b90 Author: Sunil V L Date: Mon May 27 13:41:13 2024 +0530 irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails [ Upstream commit 0110c4b110477bb1f19b0d02361846be7ab08300 ] When riscv_intc_init_common() fails, the firmware node allocated is not freed. Add the missing free(). Fixes: 7023b9d83f03 ("irqchip/riscv-intc: Add ACPI support") Signed-off-by: Sunil V L Signed-off-by: Thomas Gleixner Reviewed-by: Anup Patel Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527081113.616189-1-sunilvl@ventanamicro.com Signed-off-by: Sasha Levin commit 85ca483e729d842bc453080b730fdaef84a75be9 Author: Yu Chien Peter Lin Date: Thu Feb 22 16:39:39 2024 +0800 irqchip/riscv-intc: Introduce Andes hart-level interrupt controller [ Upstream commit f4cc33e78ba8624a79ba8dea98ce5c85aa9ca33c ] Add support for the Andes hart-level interrupt controller. This controller provides interrupt mask/unmask functions to access the custom register (SLIE) where the non-standard S-mode local interrupt enable bits are located. The base of custom interrupt number is set to 256. To share the riscv_intc_domain_map() with the generic RISC-V INTC and ACPI, add a chip parameter to riscv_intc_init_common(), so it can be passed to the irq_domain_set_info() as a private data. Andes hart-level interrupt controller requires the "andestech,cpu-intc" compatible string to be present in interrupt-controller of cpu node to enable the use of custom local interrupt source. e.g., cpu0: cpu@0 { compatible = "andestech,ax45mp", "riscv"; ... cpu0-intc: interrupt-controller { #interrupt-cells = <0x01>; compatible = "andestech,cpu-intc", "riscv,cpu-intc"; interrupt-controller; }; }; Signed-off-by: Yu Chien Peter Lin Signed-off-by: Thomas Gleixner Reviewed-by: Randolph Reviewed-by: Anup Patel Link: https://lore.kernel.org/r/20240222083946.3977135-4-peterlin@andestech.com Stable-dep-of: 0110c4b11047 ("irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails") Signed-off-by: Sasha Levin commit 482095341313ad9686a6b966c05155132228208e Author: Yu Chien Peter Lin Date: Thu Feb 22 16:39:38 2024 +0800 irqchip/riscv-intc: Allow large non-standard interrupt number [ Upstream commit 96303bcb401c21dc1426d8d9bb1fc74aae5c02a9 ] Currently, the implementation of the RISC-V INTC driver uses the interrupt cause as the hardware interrupt number, with a maximum of 64 interrupts. However, the platform can expand the interrupt number further for custom local interrupts. To fully utilize the available local interrupt sources, switch to using irq_domain_create_tree() that creates the radix tree map, add global variables (riscv_intc_nr_irqs, riscv_intc_custom_base and riscv_intc_custom_nr_irqs) to determine the valid range of local interrupt number (hwirq). Signed-off-by: Yu Chien Peter Lin Signed-off-by: Thomas Gleixner Reviewed-by: Randolph Reviewed-by: Anup Patel Reviewed-by: Atish Patra Link: https://lore.kernel.org/r/20240222083946.3977135-3-peterlin@andestech.com Stable-dep-of: 0110c4b11047 ("irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails") Signed-off-by: Sasha Levin commit 01c987b8282c876e61a28325dbb9274be49e0ab9 Author: Dev Jain Date: Tue May 21 13:13:56 2024 +0530 selftests/mm: compaction_test: fix bogus test success on Aarch64 [ Upstream commit d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 ] Patch series "Fixes for compaction_test", v2. The compaction_test memory selftest introduces fragmentation in memory and then tries to allocate as many hugepages as possible. This series addresses some problems. On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since compaction_index becomes 0, which is less than 3, due to no division by zero exception being raised. We fix that by checking for division by zero. Secondly, correctly set the number of hugepages to zero before trying to set a large number of them. Now, consider a situation in which, at the start of the test, a non-zero number of hugepages have been already set (while running the entire selftests/mm suite, or manually by the admin). The test operates on 80% of memory to avoid OOM-killer invocation, and because some memory is already blocked by hugepages, it would increase the chance of OOM-killing. Also, since mem_free used in check_compaction() is the value before we set nr_hugepages to zero, the chance that the compaction_index will be small is very high if the preset nr_hugepages was high, leading to a bogus test success. This patch (of 3): Currently, if at runtime we are not able to allocate a huge page, the test will trivially pass on Aarch64 due to no exception being raised on division by zero while computing compaction_index. Fix that by checking for nr_hugepages == 0. Anyways, in general, avoid a division by zero by exiting the program beforehand. While at it, fix a typo, and handle the case where the number of hugepages may overflow an integer. Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain Cc: Anshuman Khandual Cc: Shuah Khan Cc: Sri Jayaramappa Cc: Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin commit 3f6ccd40afc49c7b0d633a9802720da0d1858889 Author: Mark Brown Date: Fri Feb 9 14:30:04 2024 +0000 selftests/mm: log a consistent test name for check_compaction [ Upstream commit f3b7568c49420d2dcd251032c9ca1e069ec8a6c9 ] Every test result report in the compaction test prints a distinct log messae, and some of the reports print a name that varies at runtime. This causes problems for automation since a lot of automation software uses the printed string as the name of the test, if the name varies from run to run and from pass to fail then the automation software can't identify that a test changed result or that the same tests are being run. Refactor the logging to use a consistent name when printing the result of the test, printing the existing messages as diagnostic information instead so they are still available for people trying to interpret the results. Link: https://lkml.kernel.org/r/20240209-kselftest-mm-cleanup-v1-2-a3c0386496b5@kernel.org Signed-off-by: Mark Brown Cc: Muhammad Usama Anjum Cc: Ryan Roberts Cc: Shuah Khan Signed-off-by: Andrew Morton Stable-dep-of: d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64") Signed-off-by: Sasha Levin commit d39532e9186ac47144289b4b5007a00f5f04bdf8 Author: Muhammad Usama Anjum Date: Mon Jan 1 13:36:12 2024 +0500 selftests/mm: conform test to TAP format output [ Upstream commit 9a21701edc41465de56f97914741bfb7bfc2517d ] Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240101083614.1076768-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum Cc: Shuah Khan Signed-off-by: Andrew Morton Stable-dep-of: d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64") Signed-off-by: Sasha Levin commit bb9bb13ce64cc7cae47f5e2ab9ce93b7bfa0117e Author: Miaohe Lin Date: Thu May 23 15:12:17 2024 +0800 mm/memory-failure: fix handling of dissolved but not taken off from buddy pages [ Upstream commit 8cf360b9d6a840700e06864236a01a883b34bbad ] When I did memory failure tests recently, below panic occurs: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page)) ------------[ cut here ]------------ kernel BUG at include/linux/page-flags.h:1009! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__del_page_from_free_list+0x151/0x180 RSP: 0018:ffffa49c90437998 EFLAGS: 00000046 RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0 RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69 R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80 R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009 FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0 Call Trace: __rmqueue_pcplist+0x23b/0x520 get_page_from_freelist+0x26b/0xe40 __alloc_pages_noprof+0x113/0x1120 __folio_alloc_noprof+0x11/0xb0 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130 __alloc_fresh_hugetlb_folio+0xe7/0x140 alloc_pool_huge_folio+0x68/0x100 set_max_huge_pages+0x13d/0x340 hugetlb_sysctl_handler_common+0xe8/0x110 proc_sys_call_handler+0x194/0x280 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff916114887 RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887 RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003 RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0 R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00 Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- And before the panic, there had an warning about bad page state: BUG: Bad page state in process page-types pfn:8cee00 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) page_type: 0xffffff7f(buddy) raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22 Call Trace: dump_stack_lvl+0x83/0xa0 bad_page+0x63/0xf0 free_unref_page+0x36e/0x5c0 unpoison_memory+0x50b/0x630 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xcd/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f189a514887 RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887 RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003 RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8 R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040 The root cause should be the below race: memory_failure try_memory_failure_hugetlb me_huge_page __page_handle_poison dissolve_free_hugetlb_folio drain_all_pages -- Buddy page can be isolated e.g. for compaction. take_page_off_buddy -- Failed as page is not in the buddy list. -- Page can be putback into buddy after compaction. page_ref_inc -- Leads to buddy page with refcnt = 1. Then unpoison_memory() can unpoison the page and send the buddy page back into buddy list again leading to the above bad page state warning. And bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to allocate this page. Fix this issue by only treating __page_handle_poison() as successful when it returns 1. Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin Cc: Naoya Horiguchi Cc: Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin commit fe01748ca6d6ad25d31eaa61d155f5ecf80907cc Author: Matthew Wilcox (Oracle) Date: Fri Nov 17 16:14:45 2023 +0000 memory-failure: use a folio in me_huge_page() [ Upstream commit b6fd410c32f1a66a52a42d6aae1ab7b011b74547 ] This function was already explicitly calling compound_head(); unfortunately the compiler can't know that and elide the redundant calls to compound_head() buried in page_mapping(), unlock_page(), etc. Switch to using a folio, which does let us elide these calls. Link: https://lkml.kernel.org/r/20231117161447.2461643-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Stable-dep-of: 8cf360b9d6a8 ("mm/memory-failure: fix handling of dissolved but not taken off from buddy pages") Signed-off-by: Sasha Levin commit 130b4b9478c3c3771c4d7dc50a90fe3808d61d76 Author: Gabor Juhos Date: Mon Mar 4 14:14:53 2024 +0100 firmware: qcom_scm: disable clocks if qcom_scm_bw_enable() fails [ Upstream commit 0c50b7fcf2773b4853e83fc15aba1a196ba95966 ] There are several functions which are calling qcom_scm_bw_enable() then returns immediately if the call fails and leaves the clocks enabled. Change the code of these functions to disable clocks when the qcom_scm_bw_enable() call fails. This also fixes a possible dma buffer leak in the qcom_scm_pas_init_image() function. Compile tested only due to lack of hardware with interconnect support. Cc: stable@vger.kernel.org Fixes: 65b7ebda5028 ("firmware: qcom_scm: Add bw voting support to the SCM interface") Signed-off-by: Gabor Juhos Reviewed-by: Mukesh Ojha Link: https://lore.kernel.org/r/20240304-qcom-scm-disable-clk-v1-1-b36e51577ca1@gmail.com Signed-off-by: Bjorn Andersson Signed-off-by: Sasha Levin commit 16ece7c5645a68efca22a0a69a7718c8e50c3232 Author: Namjae Jeon Date: Thu May 2 10:07:50 2024 +0900 ksmbd: use rwsem instead of rwlock for lease break [ Upstream commit d1c189c6cb8b0fb7b5ee549237d27889c40c2f8b ] lease break wait for lease break acknowledgment. rwsem is more suitable than unlock while traversing the list for parent lease break in ->m_op_list. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon Signed-off-by: Steve French Signed-off-by: Sasha Levin commit 6548d543a27449a1a3d8079925de93f5764d6f22 Author: Su Hui Date: Wed Jun 5 11:47:43 2024 +0800 net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool() [ Upstream commit 0dcc53abf58d572d34c5313de85f607cd33fc691 ] Clang static checker (scan-build) warning: net/ethtool/ioctl.c:line 2233, column 2 Called function pointer is null (null dereference). Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix this typo error. Fixes: 201ed315f967 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers") Signed-off-by: Su Hui Reviewed-by: Przemek Kitszel Reviewed-by: Hariprasad Kelam Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 7e796c3fefa8b17b30e7252886ae8cffacd2b9ef Author: Eric Dumazet Date: Tue Jun 4 19:35:49 2024 +0000 ipv6: fix possible race in __fib6_drop_pcpu_from() [ Upstream commit b01e1c030770ff3b4fe37fc7cc6bca03f594133f ] syzbot found a race in __fib6_drop_pcpu_from() [1] If compiler reads more than once (*ppcpu_rt), second read could read NULL, if another cpu clears the value in rt6_get_pcpu_route(). Add a READ_ONCE() to prevent this race. Also add rcu_read_lock()/rcu_read_unlock() because we rely on RCU protection while dereferencing pcpu_rt. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097] CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: netns cleanup_net RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984 Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48 RSP: 0018:ffffc900040df070 EFLAGS: 00010206 RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16 RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091 RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8 R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline] fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline] fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038 fib6_del_route net/ipv6/ip6_fib.c:1998 [inline] fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043 fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205 fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175 fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255 __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271 rt6_sync_down_dev net/ipv6/route.c:4906 [inline] rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911 addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855 addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992 call_netdevice_notifiers_extack net/core/dev.c:2030 [inline] call_netdevice_notifiers net/core/dev.c:2044 [inline] dev_close_many+0x333/0x6a0 net/core/dev.c:1585 unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193 unregister_netdevice_many net/core/dev.c:11276 [inline] default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759 ops_exit_list+0x128/0x180 net/core/net_namespace.c:178 cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Eric Dumazet Cc: Martin KaFai Lau Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit d8011254e9b123615dc31c00cf240986c4ff88eb Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:41 2024 -0700 af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill(). [ Upstream commit efaf24e30ec39ebbea9112227485805a48b0ceb1 ] While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock(). Let's use READ_ONCE() to read sk->sk_shutdown. Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 60db0759c4f52b27bf0fd72afbdeb5a4d92ecd96 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:40 2024 -0700 af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen(). [ Upstream commit 5d915e584d8408211d4567c22685aae8820bfc55 ] We can dump the socket queue length via UNIX_DIAG by specifying UDIAG_SHOW_RQLEN. If sk->sk_state is TCP_LISTEN, we return the recv queue length, but here we do not hold recvq lock. Let's use skb_queue_len_lockless() in sk_diag_show_rqlen(). Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 471ec7b77a8d45cd342a28ac44935d484a98ccea Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:39 2024 -0700 af_unix: Use skb_queue_empty_lockless() in unix_release_sock(). [ Upstream commit 83690b82d228b3570565ebd0b41873933238b97f ] If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock() checks the length of the peer socket's recvq under unix_state_lock(). However, unix_stream_read_generic() calls skb_unlink() after releasing the lock. Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks skb without unix_state_lock(). Thues, unix_state_lock() does not protect qlen. Let's use skb_queue_empty_lockless() in unix_release_sock(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit f1683d07ebd10464d3cc15ea613223e8d1a6f5fc Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:38 2024 -0700 af_unix: Use unix_recvq_full_lockless() in unix_stream_connect(). [ Upstream commit 45d872f0e65593176d880ec148f41ad7c02e40a7 ] Once sk->sk_state is changed to TCP_LISTEN, it never changes. unix_accept() takes advantage of this characteristics; it does not hold the listener's unix_state_lock() and only acquires recvq lock to pop one skb. It means unix_state_lock() does not prevent the queue length from changing in unix_stream_connect(). Thus, we need to use unix_recvq_full_lockless() to avoid data-race. Now we remove unix_recvq_full() as no one uses it. Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in unix_recvq_full_lockless() because of the following reasons: (1) For SOCK_DGRAM, it is a written-once field in unix_create1() (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the listener's unix_state_lock() in unix_listen(), and we hold the lock in unix_stream_connect() Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 29fce603b14b1140cbd5841e00080b6b01ba3430 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:37 2024 -0700 af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen. [ Upstream commit bd9f2d05731f6a112d0c7391a0d537bfc588dbe6 ] net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be changed concurrently. Let's use READ_ONCE() in unix_create1(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 996ec22ff576a6cf59d199f523aed57d105a98a8 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:36 2024 -0700 af_unix: Annotate data-races around sk->sk_sndbuf. [ Upstream commit b0632e53e0da8054e36bc973f0eec69d30f1b7c6 ] sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's not used in af_unix.c. Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(), unix_dgram_sendmsg(), and unix_stream_sendmsg(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 4398f59518ceccc2f34e21c87accdae5b0b064fd Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:35 2024 -0700 af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG. [ Upstream commit 0aa3be7b3e1f8f997312cc4705f8165e02806f8f ] While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read locklessly. Let's use READ_ONCE() there. Note that the result could be inconsistent if the socket is dumped during the state change. This is common for other SOCK_DIAG and similar interfaces. Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA") Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 0ede400c32ae9cd13b1eb916a8428d31085076d0 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:34 2024 -0700 af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb(). [ Upstream commit af4c733b6b1aded4dc808fafece7dfe6e9d2ebb3 ] unix_stream_read_skb() is called from sk->sk_data_ready() context where unix_state_lock() is not held. Let's use READ_ONCE() there. Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 776fcc45e3f415a898fea92ef8d22d8626ae356d Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:33 2024 -0700 af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg(). [ Upstream commit 8a34d4e8d9742a24f74998f45a6a98edd923319b ] The following functions read sk->sk_state locklessly and proceed only if the state is TCP_ESTABLISHED. * unix_stream_sendmsg * unix_stream_read_generic * unix_seqpacket_sendmsg * unix_seqpacket_recvmsg Let's use READ_ONCE() there. Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 3d25de6486f43a561d7443027734fde94551a130 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:31 2024 -0700 af_unix: Annotate data-race of sk->sk_state in unix_stream_connect(). [ Upstream commit a9bf9c7dc6a5899c01cb8f6e773a66315a5cd4b7 ] As small optimisation, unix_stream_connect() prefetches the client's sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE. Later, sk->sk_state is checked again under unix_state_lock(). Let's use READ_ONCE() for the first check and TCP_CLOSE directly for the second check. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 484e036e1a2c1851c3159c4983b29116acc2624b Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:30 2024 -0700 af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll(). [ Upstream commit eb0718fb3e97ad0d6f4529b810103451c90adf94 ] unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and calls unix_writable() which also reads sk->sk_state without holding unix_state_lock(). Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass it to unix_writable(). While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as that state does not exist for AF_UNIX socket since the code was added. Fixes: 1586a5877db9 ("af_unix: do not report POLLOUT on listeners") Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 4e38d6c04943a52ee8f8cc87bb0e9040647a35fb Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:29 2024 -0700 af_unix: Annotate data-race of sk->sk_state in unix_inq_len(). [ Upstream commit 3a0f38eb285c8c2eead4b3230c7ac2983707599d ] ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first and returns -EINVAL if it's TCP_LISTEN. Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of bytes in recvq. However, unix_inq_len() does not hold unix_state_lock(), and the concurrent listen() might change the state after checking sk->sk_state. If the race occurs, 0 is returned for the listener, instead of -EINVAL, because the length of skb with embryo is 0. We could hold unix_state_lock() in unix_inq_len(), but it's overkill given the result is true for pre-listen() TCP_CLOSE state. So, let's use READ_ONCE() for sk->sk_state in unix_inq_len(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 45733e981e8cac0fd85ced9e4f1f8d71c3988d04 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:28 2024 -0700 af_unix: Annodate data-races around sk->sk_state for writers. [ Upstream commit 942238f9735a4a4ebf8274b218d9a910158941d1 ] sk->sk_state is changed under unix_state_lock(), but it's read locklessly in many places. This patch adds WRITE_ONCE() on the writer side. We will add READ_ONCE() to the lockless readers in the following patches. Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 8003545ca10d2028b226c9f2f3946bec4d9e60e6 Author: Kuniyuki Iwashima Date: Tue Jun 4 09:52:27 2024 -0700 af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer. [ Upstream commit 26bfb8b57063f52b867f9b6c8d1742fcb5bd656c ] When a SOCK_DGRAM socket connect()s to another socket, the both sockets' sk->sk_state are changed to TCP_ESTABLISHED so that we can register them to BPF SOCKMAP. When the socket disconnects from the peer by connect(AF_UNSPEC), the state is set back to TCP_CLOSE. Then, the peer's state is also set to TCP_CLOSE, but the update is done locklessly and unconditionally. Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects from B. After the first two connect()s, all three sockets' sk->sk_state are TCP_ESTABLISHED: $ ss -xa Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess u_dgr ESTAB 0 0 @A 641 * 642 u_dgr ESTAB 0 0 @B 642 * 643 u_dgr ESTAB 0 0 @C 643 * 0 And after the disconnect, B's state is TCP_CLOSE even though it's still connected to C and C's state is TCP_ESTABLISHED. $ ss -xa Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess u_dgr UNCONN 0 0 @A 641 * 0 u_dgr UNCONN 0 0 @B 642 * 643 u_dgr ESTAB 0 0 @C 643 * 0 In this case, we cannot register B to SOCKMAP. So, when a socket disconnects from the peer, we should not set TCP_CLOSE to the peer if the peer is connected to yet another socket, and this must be done under unix_state_lock(). Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless readers. These data-races will be fixed in the following patches. Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 040d9384870386eb5dc55472ac573ac7756b2050 Author: Aleksandr Mishin Date: Tue Jun 4 11:25:00 2024 +0300 net: wwan: iosm: Fix tainted pointer delete is case of region creation fail [ Upstream commit b0c9a26435413b81799047a7be53255640432547 ] In case of region creation fail in ipc_devlink_create_region(), previously created regions delete process starts from tainted pointer which actually holds error code value. Fix this bug by decreasing region index before delete. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 4dcd183fbd67 ("net: wwan: iosm: devlink registration") Signed-off-by: Aleksandr Mishin Acked-by: Sergey Ryazanov Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240604082500.20769-1-amishin@t-argos.ru Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit 649b63f5daf66853042b8c11051770a6d4833dc0 Author: Larysa Zaremba Date: Mon Jun 3 14:42:33 2024 -0700 ice: add flag to distinguish reset from .ndo_bpf in XDP rings config [ Upstream commit 744d197162c2070a6045a71e2666ed93a57cc65d ] Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in the rebuild process. The behaviour of the XDP rings config functions is context-dependent, so the change of order has led to ice_destroy_xdp_rings() doing additional work and removing XDP prog, when it was supposed to be preserved. Also, dependency on the PF state reset flags creates an additional, fortunately less common problem: * PFR is requested e.g. by tx_timeout handler * .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(), but reset flag is set, so rings are destroyed without deleting the program * ice_vsi_rebuild tries to delete non-existent XDP rings, because the program is still on the VSI * system crashes With a similar race, when requested to attach a program, ice_prepare_xdp_rings() can actually skip setting the program in the VSI and nevertheless report success. Instead of reverting to the old order of function calls, add an enum argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in order to distinguish between calls from rebuild and .ndo_bpf(). Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Igor Bagnucki Signed-off-by: Larysa Zaremba Reviewed-by: Simon Horman Tested-by: Chandan Kumar Rout Signed-off-by: Jacob Keller Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit eab834acb474f2c109c0df7ca05f31ff00d75f0a Author: Larysa Zaremba Date: Mon Jun 3 14:42:32 2024 -0700 ice: remove af_xdp_zc_qps bitmap [ Upstream commit adbf5a42341f6ea038d3626cd4437d9f0ad0b2dd ] Referenced commit has introduced a bitmap to distinguish between ZC and copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this for us. The bitmap would be especially useful when restoring previous state after rebuild, if only it was not reallocated in the process. This leads to e.g. xdpsock dying after changing number of queues. Instead of preserving the bitmap during the rebuild, remove it completely and distinguish between ZC and copy-mode queues based on the presence of a device associated with the pool. Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap") Reviewed-by: Przemek Kitszel Signed-off-by: Larysa Zaremba Reviewed-by: Simon Horman Tested-by: Chandan Kumar Rout Signed-off-by: Jacob Keller Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 3201ba7d1c8c081b8bae94ca57f17bfc73e04131 Author: Jacob Keller Date: Mon Jun 3 14:42:30 2024 -0700 ice: fix iteration of TLVs in Preserved Fields Area [ Upstream commit 03e4a092be8ce3de7c1baa7ae14e68b64e3ea644 ] The ice_get_pfa_module_tlv() function iterates over the Type-Length-Value structures in the Preserved Fields Area (PFA) of the NVM. This is used by the driver to access data such as the Part Board Assembly identifier. The function uses simple logic to iterate over the PFA. First, the pointer to the PFA in the NVM is read. Then the total length of the PFA is read from the first word. A pointer to the first TLV is initialized, and a simple loop iterates over each TLV. The pointer is moved forward through the NVM until it exceeds the PFA area. The logic seems sound, but it is missing a key detail. The Preserved Fields Area length includes one additional final word. This is documented in the device data sheet as a dummy word which contains 0xFFFF. All NVMs have this extra word. If the driver tries to scan for a TLV that is not in the PFA, it will read past the size of the PFA. It reads and interprets the last dummy word of the PFA as a TLV with type 0xFFFF. It then reads the word following the PFA as a length. The PFA resides within the Shadow RAM portion of the NVM, which is relatively small. All of its offsets are within a 16-bit size. The PFA pointer and TLV pointer are stored by the driver as 16-bit values. In almost all cases, the word following the PFA will be such that interpreting it as a length will result in 16-bit arithmetic overflow. Once overflowed, the new next_tlv value is now below the maximum offset of the PFA. Thus, the driver will continue to iterate the data as TLVs. In the worst case, the driver hits on a sequence of reads which loop back to reading the same offsets in an endless loop. To fix this, we need to correct the loop iteration check to account for this extra word at the end of the PFA. This alone is sufficient to resolve the known cases of this issue in the field. However, it is plausible that an NVM could be misconfigured or have corrupt data which results in the same kind of overflow. Protect against this by using check_add_overflow when calculating both the maximum offset of the TLVs, and when calculating the next_tlv offset at the end of each loop iteration. This ensures that the driver will not get stuck in an infinite loop when scanning the PFA. Fixes: e961b679fb0b ("ice: add board identifier info to devlink .info_get") Co-developed-by: Paul Greenwalt Signed-off-by: Paul Greenwalt Reviewed-by: Przemek Kitszel Tested-by: Pucha Himasekhar Reddy Signed-off-by: Jacob Keller Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-1-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit b21bb09f8be67ba6b255004157a9fc92f71f183d Author: Karol Kolacinski Date: Tue Jun 4 14:05:27 2024 +0200 ptp: Fix error message on failed pin verification [ Upstream commit 323a359f9b077f382f4483023d096a4d316fd135 ] On failed verification of PTP clock pin, error message prints channel number instead of pin index after "pin", which is incorrect. Fix error message by adding channel number to the message and printing pin number instead of channel number. Fixes: 6092315dfdec ("ptp: introduce programmable pins.") Signed-off-by: Karol Kolacinski Acked-by: Richard Cochran Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 724050ae4b76e4fae05a923cb54101d792cf4404 Author: Eric Dumazet Date: Tue Jun 4 18:15:11 2024 +0000 net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP [ Upstream commit f921a58ae20852d188f70842431ce6519c4fdc36 ] If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided, taprio_parse_mqprio_opt() must validate it, or userspace can inject arbitrary data to the kernel, the second time taprio_change() is called. First call (with valid attributes) sets dev->num_tc to a non zero value. Second call (with arbitrary mqprio attributes) returns early from taprio_parse_mqprio_opt() and bad things can happen. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: Noam Rathaus Signed-off-by: Eric Dumazet Acked-by: Vinicius Costa Gomes Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit d857df86837ac1c30592e8a068204d16feac9930 Author: Aleksandr Mishin Date: Tue Jun 4 13:05:52 2024 +0300 net/mlx5: Fix tainted pointer delete is case of flow rules creation fail [ Upstream commit 229bedbf62b13af5aba6525ad10b62ad38d9ccb5 ] In case of flow rule creation fail in mlx5_lag_create_port_sel_table(), instead of previously created rules, the tainted pointer is deleted deveral times. Fix this bug by using correct flow rules pointers. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Aleksandr Mishin Reviewed-by: Jacob Keller Reviewed-by: Tariq Toukan Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 6ccada6ffb42e0ac75e3db06d41baf5a7f483f8a Author: Shay Drory Date: Tue Jun 4 00:04:43 2024 +0300 net/mlx5: Always stop health timer during driver removal [ Upstream commit c8b3f38d2dae0397944814d691a419c451f9906f ] Currently, if teardown_hca fails to execute during driver removal, mlx5 does not stop the health timer. Afterwards, mlx5 continue with driver teardown. This may lead to a UAF bug, which results in page fault Oops[1], since the health timer invokes after resources were freed. Hence, stop the health monitor even if teardown_hca fails. [1] mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: cleanup mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup BUG: unable to handle page fault for address: ffffa26487064230 PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 RIP: 0010:ioread32be+0x34/0x60 RSP: 0018:ffffa26480003e58 EFLAGS: 00010292 RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0 RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230 RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8 R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0 R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0 FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x175/0x180 ? asm_exc_page_fault+0x26/0x30 ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? ioread32be+0x34/0x60 mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] poll_health+0x42/0x230 [mlx5_core] ? __next_timer_interrupt+0xbc/0x110 ? __pfx_poll_health+0x10/0x10 [mlx5_core] call_timer_fn+0x21/0x130 ? __pfx_poll_health+0x10/0x10 [mlx5_core] __run_timers+0x222/0x2c0 run_timer_softirq+0x1d/0x40 __do_softirq+0xc9/0x2c8 __irq_exit_rcu+0xa6/0xc0 sysvec_apic_timer_interrupt+0x72/0x90 asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:cpuidle_enter_state+0xcc/0x440 ? cpuidle_enter_state+0xbd/0x440 cpuidle_enter+0x2d/0x40 do_idle+0x20d/0x270 cpu_startup_entry+0x2a/0x30 rest_init+0xd0/0xd0 arch_call_rest_init+0xe/0x30 start_kernel+0x709/0xa90 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0x96/0xa0 secondary_startup_64_no_verify+0x18f/0x19b ---[ end trace 0000000000000000 ]--- Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit e3001df36cd60aa89ce232a520732cc19f9a5810 Author: Moshe Shemesh Date: Tue Jun 4 00:04:42 2024 +0300 net/mlx5: Stop waiting for PCI if pci channel is offline [ Upstream commit 33afbfcc105a572159750f2ebee834a8a70fdd96 ] In case pci channel becomes offline the driver should not wait for PCI reads during health dump and recovery flow. The driver has timeout for each of these loops trying to read PCI, so it would fail anyway. However, in case of recovery waiting till timeout may cause the pci error_detected() callback fail to meet pci_dpc_recovered() wait timeout. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drori Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 3dd41669b300d63bcec53787560929cbcfabf25c Author: Jason Xing Date: Tue Jun 4 01:02:17 2024 +0800 mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB [ Upstream commit 9633e9377e6af0244f7381e86b9aac5276f5be97 ] Like previous patch does in TCP, we need to adhere to RFC 1213: "tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT." So let's consider CLOSE-WAIT sockets. The logic of counting When we increment the counter? a) Only if we change the state to ESTABLISHED. When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK. Fixes: d9cd27b8cd19 ("mptcp: add CurrEstab MIB counter support") Signed-off-by: Jason Xing Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit acdf17546ef8ee73c94e442e3f4b933e42c3dfac Author: Jason Xing Date: Tue Jun 4 01:02:16 2024 +0800 tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB [ Upstream commit a46d0ea5c94205f40ecf912d1bb7806a8a64704f ] According to RFC 1213, we should also take CLOSE-WAIT sockets into consideration: "tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT." After this, CurrEstab counter will display the total number of ESTABLISHED and CLOSE-WAIT sockets. The logic of counting When we increment the counter? a) if we change the state to ESTABLISHED. b) if we change the state from SYN-RECEIVED to CLOSE-WAIT. When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK. Please note: there are two chances that old state of socket can be changed to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED. So we have to take care of the former case. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason Xing Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 924f7bbfc5cfd029e417c56357ca01eae681fba6 Author: Daniel Borkmann Date: Mon Jun 3 10:59:26 2024 +0200 vxlan: Fix regression when dropping packets due to invalid src addresses [ Upstream commit 1cd4bc987abb2823836cbb8f887026011ccddc8a ] Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") has recently been added to vxlan mainly in the context of source address snooping/learning so that when it is enabled, an entry in the FDB is not being created for an invalid address for the corresponding tunnel endpoint. Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in that it passed through whichever macs were set in the L2 header. It turns out that this change in behavior breaks setups, for example, Cilium with netkit in L3 mode for Pods as well as tunnel mode has been passing before the change in f58f45c1e5b9 for both vxlan and geneve. After mentioned change it is only passing for geneve as in case of vxlan packets are dropped due to vxlan_set_mac() returning false as source and destination macs are zero which for E/W traffic via tunnel is totally fine. Fix it by only opting into the is_valid_ether_addr() check in vxlan_set_mac() when in fact source address snooping/learning is actually enabled in vxlan. This is done by moving the check into vxlan_snoop(). With this change, the Cilium connectivity test suite passes again for both tunnel flavors. Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") Signed-off-by: Daniel Borkmann Cc: David Bauer Cc: Ido Schimmel Cc: Nikolay Aleksandrov Cc: Martin KaFai Lau Reviewed-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Reviewed-by: David Bauer Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 54c2c171c11a798fe887b3ff72922aa9d1411c1e Author: Hangyu Hua Date: Mon Jun 3 15:13:03 2024 +0800 net: sched: sch_multiq: fix possible OOB write in multiq_tune() [ Upstream commit affc18fdc694190ca7575b9a86632a73b9fe043d ] q->bands will be assigned to qopt->bands to execute subsequent code logic after kmalloc. So the old q->bands should not be used in kmalloc. Otherwise, an out-of-bounds write will occur. Fixes: c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock") Signed-off-by: Hangyu Hua Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit cba5467442b2d5880f5235c7c3a7317d5a9ecbfc Author: Tristram Ha Date: Thu May 30 18:38:01 2024 -0700 net: phy: Micrel KSZ8061: fix errata solution not taking effect problem [ Upstream commit 0a8d3f2e3e8d8aea8af017e14227b91d5989b696 ] KSZ8061 needs to write to a MMD register at driver initialization to fix an errata. This worked in 5.0 kernel but not in newer kernels. The issue is the main phylib code no longer resets PHY at the very beginning. Calling phy resuming code later will reset the chip if it is already powered down at the beginning. This wipes out the MMD register write. Solution is to implement a phy resume function for KSZ8061 to take care of this problem. Fixes: 232ba3a51cc2 ("net: phy: Micrel KSZ8061: link failure after cable connect") Signed-off-by: Tristram Ha Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit b03255294e88a73583fb39d496cb17775145e09b Author: Wen Gu Date: Fri May 31 16:54:17 2024 +0800 net/smc: avoid overwriting when adjusting sock bufsizes [ Upstream commit fb0aa0781a5f457e3864da68af52c3b1f4f7fd8f ] When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf to sysctl_tcp_wmem[1], since this may overwrite the value set by tcp_sndbuf_expand() in TCP connection establishment. And the other setting sk_{snd|rcv}buf to sysctl value in smc_adjust_sock_bufsizes() can also be omitted since the initialization of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem or ipv4_sysctl_tcp_{w|r}mem[1]. Fixes: 30c3c4a4497c ("net/smc: Use correct buffer sizes when switching between TCP and SMC") Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.com Signed-off-by: Wen Gu Reviewed-by: Wenjia Zhang Reviewed-by: Gerd Bayer , too. Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 2d7912f3ac6545afe479736a6970c0ee00ffd66c Author: Subbaraya Sundeep Date: Wed May 29 20:59:44 2024 +0530 octeontx2-af: Always allocate PF entries from low prioriy zone [ Upstream commit 8b0f7410942cdc420c4557eda02bfcdf60ccec17 ] PF mcam entries has to be at low priority always so that VF can install longest prefix match rules at higher priority. This was taken care currently but when priority allocation wrt reference entry is requested then entries are allocated from mid-zone instead of low priority zone. Fix this and always allocate entries from low priority zone for PFs. Fixes: 7df5b4b260dd ("octeontx2-af: Allocate low priority entries for PF") Signed-off-by: Subbaraya Sundeep Reviewed-by: Jacob Keller Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit d387805d4b4a46ee01e3dae133c81b6d80195e5b Author: Jiri Olsa Date: Tue Jun 4 17:00:24 2024 +0200 bpf: Set run context for rawtp test_run callback [ Upstream commit d0d1df8ba18abc57f28fb3bc053b2bf319367f2c ] syzbot reported crash when rawtp program executed through the test_run interface calls bpf_get_attach_cookie helper or any other helper that touches task->bpf_ctx pointer. Setting the run context (task->bpf_ctx pointer) for test_run callback. Fixes: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value") Reported-by: syzbot+3ab78ff125b7979e45f9@syzkaller.appspotmail.com Signed-off-by: Jiri Olsa Signed-off-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann Closes: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9 Link: https://lore.kernel.org/bpf/20240604150024.359247-1-jolsa@kernel.org Signed-off-by: Sasha Levin commit 50569d12945f86fa4b321c4b1c3005874dbaa0f1 Author: Jakub Kicinski Date: Thu May 30 16:26:07 2024 -0700 net: tls: fix marking packets as decrypted [ Upstream commit a535d59432370343058755100ee75ab03c0e3f91 ] For TLS offload we mark packets with skb->decrypted to make sure they don't escape the host without getting encrypted first. The crypto state lives in the socket, so it may get detached by a call to skb_orphan(). As a safety check - the egress path drops all packets with skb->decrypted and no "crypto-safe" socket. The skb marking was added to sendpage only (and not sendmsg), because tls_device injected data into the TCP stack using sendpage. This special case was missed when sendpage got folded into sendmsg. Fixes: c5c37af6ecad ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES") Signed-off-by: Jakub Kicinski Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.org Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin commit f8dd092e8b47dc43c1b0e136bb1926f9f75ac528 Author: Eric Dumazet Date: Fri May 31 13:26:34 2024 +0000 ipv6: sr: block BH in seg6_output_core() and seg6_input_core() [ Upstream commit c0b98ac1cc104f48763cdb27b1e9ac25fd81fc90 ] As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in seg6_output_core() is not good enough, because seg6_output_core() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter seg6_output_core() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in seg6_input_core(). Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled") Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Eric Dumazet Cc: David Lebrun Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit db21c1ee6b6dc033032e82524a3b68e7bcf9bbb3 Author: Eric Dumazet Date: Fri May 31 13:26:32 2024 +0000 ipv6: ioam: block BH from ioam6_output() [ Upstream commit 2fe40483ec257de2a0d819ef88e3e76c7e261319 ] As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in ioam6_output() is not good enough, because ioam6_output() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter ioam6_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") Signed-off-by: Eric Dumazet Cc: Justin Iurman Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 9ee14af24e67ef170108db547f7d1f701b3f2bc5 Author: Matthias Stocker Date: Fri May 31 12:37:11 2024 +0200 vmxnet3: disable rx data ring on dma allocation failure [ Upstream commit ffbe335b8d471f79b259e950cb20999700670456 ] When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base, the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset rq->data_ring.desc_size for the data ring that failed, which presumably causes the hypervisor to reference it on packet reception. To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell the hypervisor to disable this feature. [ 95.436876] kernel BUG at net/core/skbuff.c:207! [ 95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1 [ 95.441558] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [ 95.443481] RIP: 0010:skb_panic+0x4d/0x4f [ 95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50 ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9 ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24 [ 95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246 [ 95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f [ 95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f [ 95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60 [ 95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000 [ 95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0 [ 95.455682] FS: 0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000 [ 95.457178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0 [ 95.459791] Call Trace: [ 95.460515] [ 95.461180] ? __die_body.cold+0x19/0x27 [ 95.462150] ? die+0x2e/0x50 [ 95.462976] ? do_trap+0xca/0x110 [ 95.463973] ? do_error_trap+0x6a/0x90 [ 95.464966] ? skb_panic+0x4d/0x4f [ 95.465901] ? exc_invalid_op+0x50/0x70 [ 95.466849] ? skb_panic+0x4d/0x4f [ 95.467718] ? asm_exc_invalid_op+0x1a/0x20 [ 95.468758] ? skb_panic+0x4d/0x4f [ 95.469655] skb_put.cold+0x10/0x10 [ 95.470573] vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3] [ 95.471853] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3] [ 95.473185] __napi_poll+0x2b/0x160 [ 95.474145] net_rx_action+0x2c6/0x3b0 [ 95.475115] handle_softirqs+0xe7/0x2a0 [ 95.476122] __irq_exit_rcu+0x97/0xb0 [ 95.477109] common_interrupt+0x85/0xa0 [ 95.478102] [ 95.478846] [ 95.479603] asm_common_interrupt+0x26/0x40 [ 95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ 95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 [ 95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246 [ 95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000 [ 95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001 [ 95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3 [ 95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260 [ 95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000 [ 95.495035] acpi_safe_halt+0x14/0x20 [ 95.496127] acpi_idle_do_entry+0x2f/0x50 [ 95.497221] acpi_idle_enter+0x7f/0xd0 [ 95.498272] cpuidle_enter_state+0x81/0x420 [ 95.499375] cpuidle_enter+0x2d/0x40 [ 95.500400] do_idle+0x1e5/0x240 [ 95.501385] cpu_startup_entry+0x29/0x30 [ 95.502422] start_secondary+0x11c/0x140 [ 95.503454] common_startup_64+0x13e/0x141 [ 95.504466] [ 95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse [ 95.516536] ---[ end trace 0000000000000000 ]--- Fixes: 6f4833383e85 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()") Signed-off-by: Matthias Stocker Reviewed-by: Subbaraya Sundeep Reviewed-by: Ronak Doshi Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 834aa2c34b8f71248c3475b8eaac4a0f67da1aa1 Author: Ravi Bangoria Date: Fri May 31 04:46:44 2024 +0000 KVM: SEV-ES: Delegate LBR virtualization to the processor [ Upstream commit b7e4be0a224fe5c6be30c1c8bdda8d2317ad6ba4 ] As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. Although KVM currently enforces LBRV for SEV-ES guests, there are multiple issues with it: o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR interception is used to dynamically toggle LBRV for performance reasons, this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3: [guest ~]# wrmsr 0x1d9 0x4 KVM: entry failed, hardware error 0xffffffff EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR is of swap type A. o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA encryption. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Co-developed-by: Nikunj A Dadhania Signed-off-by: Nikunj A Dadhania Signed-off-by: Ravi Bangoria Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin commit b6e4076ca94be82f202b16e5aecd3d025c595141 Author: Michael Roth Date: Mon Oct 16 08:27:32 2023 -0500 KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests [ Upstream commit a26b7cd2254695f8258cc370f33280db0a9a3813 ] When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out the guest-defined values while context-switching to/from guest mode. However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set, so the guest-defined value is effectively ignored when switching to guest mode with the understanding that the VMSA will handle swapping in/out this register state. However, SVM is still configured to intercept these accesses for SEV-ES guests, so the values in the initial MSR_IA32_XSS are effectively read-only, and a guest will experience undefined behavior if it actually tries to write to this MSR. Fortunately, only CET/shadowstack makes use of this register on SEV-ES-capable systems currently, which isn't yet widely used, but this may become more of an issue in the future. Additionally, enabling intercepts of MSR_IA32_XSS results in #VC exceptions in the guest in certain paths that can lead to unexpected #VC nesting levels. One example is SEV-SNP guests when handling #VC exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then generate another #VC when accessing MSR_IA32_XSS, which can lead to guest crashes if an NMI occurs at that point in time. Running perf on a guest while it is issuing such a sequence is one example where these can be problematic. Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests if the host/guest configuration allows it. If the host/guest configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so that it can be caught by the existing checks in kvm_{set,get}_msr_common() if the guest still attempts to access it. Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Cc: Alexey Kardashevskiy Suggested-by: Tom Lendacky Signed-off-by: Michael Roth Message-Id: <20231016132819.1002933-4-michael.roth@amd.com> Signed-off-by: Paolo Bonzini Stable-dep-of: b7e4be0a224f ("KVM: SEV-ES: Delegate LBR virtualization to the processor") Signed-off-by: Sasha Levin commit 2128bae4ecabff2fa232f91ebf9421c767ce7e77 Author: Ravi Bangoria Date: Fri May 31 04:46:43 2024 +0000 KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent [ Upstream commit d922056215617eedfbdbc29fe49953423686fe5e ] As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. So, prevent SEV-ES guests when LBRV support is missing. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Signed-off-by: Ravi Bangoria Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin commit 91cff53136daeff50816b0baeafd38a6976f6209 Author: Cong Wang Date: Sun Jun 2 11:27:03 2024 -0700 bpf: Fix a potential use-after-free in bpf_link_free() [ Upstream commit 2884dc7d08d98a89d8d65121524bb7533183a63a ] After commit 1a80dbcb2dba, bpf_link can be freed by link->ops->dealloc_deferred, but the code still tests and uses link->ops->dealloc afterward, which leads to a use-after-free as reported by syzbot. Actually, one of them should be sufficient, so just call one of them instead of both. Also add a WARN_ON() in case of any problematic implementation. Fixes: 1a80dbcb2dba ("bpf: support deferring bpf_link dealloc to after RCU grace period") Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com Signed-off-by: Cong Wang Signed-off-by: Daniel Borkmann Acked-by: Jiri Olsa Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com Signed-off-by: Sasha Levin commit 2ad2f2edb944baf2735b23c7008b3dbe5b8da56c Author: Hou Tao Date: Mon Dec 4 22:04:23 2023 +0800 bpf: Optimize the free of inner map [ Upstream commit af66bfd3c8538ed21cf72af18426fc4a408665cf ] When removing the inner map from the outer map, the inner map will be freed after one RCU grace period and one RCU tasks trace grace period, so it is certain that the bpf program, which may access the inner map, has exited before the inner map is freed. However there is no need to wait for one RCU tasks trace grace period if the outer map is only accessed by non-sleepable program. So adding sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding the outer map into env->used_maps for sleepable program. Although the max number of bpf program is INT_MAX - 1, the number of bpf programs which are being loaded may be greater than INT_MAX, so using atomic64_t instead of atomic_t for sleepable_refcnt. When removing the inner map from the outer map, using sleepable_refcnt to decide whether or not a RCU tasks trace grace period is needed before freeing the inner map. Signed-off-by: Hou Tao Link: https://lore.kernel.org/r/20231204140425.1480317-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov Stable-dep-of: 2884dc7d08d9 ("bpf: Fix a potential use-after-free in bpf_link_free()") Signed-off-by: Sasha Levin commit 5aa03dd388d1d6a369bfaa538b9e09c7163b70b5 Author: Jiri Olsa Date: Sat Nov 25 20:31:26 2023 +0100 bpf: Store ref_ctr_offsets values in bpf_uprobe array [ Upstream commit 4930b7f53a298533bc31d7540b6ea8b79a000331 ] We will need to return ref_ctr_offsets values through link_info interface in following change, so we need to keep them around. Storing ref_ctr_offsets values directly into bpf_uprobe array. Signed-off-by: Jiri Olsa Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20231125193130.834322-3-jolsa@kernel.org Stable-dep-of: 2884dc7d08d9 ("bpf: Fix a potential use-after-free in bpf_link_free()") Signed-off-by: Sasha Levin commit 02a255723e6b427fe68c921e4b86cba05dcaee52 Author: Tristram Ha Date: Tue May 28 19:20:23 2024 -0700 net: phy: micrel: fix KSZ9477 PHY issues after suspend/resume [ Upstream commit 6149db4997f582e958da675092f21c666e3b67b7 ] When the PHY is powered up after powered down most of the registers are reset, so the PHY setup code needs to be done again. In addition the interrupt register will need to be setup again so that link status indication works again. Fixes: 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver") Signed-off-by: Tristram Ha Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 645e643eeb9a414df832b065bc46f94145a25462 Author: DelphineCCChiu Date: Wed May 29 14:58:55 2024 +0800 net/ncsi: Fix the multi thread manner of NCSI driver [ Upstream commit e85e271dec0270982afed84f70dc37703fcc1d52 ] Currently NCSI driver will send several NCSI commands back to back without waiting the response of previous NCSI command or timeout in some state when NIC have multi channel. This operation against the single thread manner defined by NCSI SPEC(section 6.3.2.3 in DSP0222_1.1.1) According to NCSI SPEC(section 6.2.13.1 in DSP0222_1.1.1), we should probe one channel at a time by sending NCSI commands (Clear initial state, Get version ID, Get capabilities...), than repeat this steps until the max number of channels which we got from NCSI command (Get capabilities) has been probed. Fixes: e6f44ed6d04d ("net/ncsi: Package and channel management") Signed-off-by: DelphineCCChiu Link: https://lore.kernel.org/r/20240529065856.825241-1-delphine_cc_chiu@wiwynn.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit d7dd9d1f02b2e481f3891314d587826382bfc8dd Author: Peter Delevoryas Date: Tue Nov 14 10:07:33 2023 -0600 net/ncsi: Simplify Kconfig/dts control flow [ Upstream commit c797ce168930ce3d62a9b7fc4d7040963ee6a01e ] Background: 1. CONFIG_NCSI_OEM_CMD_KEEP_PHY If this is enabled, we send an extra OEM Intel command in the probe sequence immediately after discovering a channel (e.g. after "Clear Initial State"). 2. CONFIG_NCSI_OEM_CMD_GET_MAC If this is enabled, we send one of 3 OEM "Get MAC Address" commands from Broadcom, Mellanox (Nvidida), and Intel in the *configuration* sequence for a channel. 3. mellanox,multi-host (or mlx,multi-host) Introduced by this patch: https://lore.kernel.org/all/20200108234341.2590674-1-vijaykhemka@fb.com/ Which was actually originally from cosmo.chou@quantatw.com: https://github.com/facebook/openbmc-linux/commit/9f132a10ec48db84613519258cd8a317fb9c8f1b Cosmo claimed that the Nvidia ConnectX-4 and ConnectX-6 NIC's don't respond to Get Version ID, et. al in the probe sequence unless you send the Set MC Affinity command first. Problem Statement: We've been using a combination of #ifdef code blocks and IS_ENABLED() conditions to conditionally send these OEM commands. It makes adding any new code around these commands hard to understand. Solution: In this patch, I just want to remove the conditionally compiled blocks of code, and always use IS_ENABLED(...) to do dynamic control flow. I don't think the small amount of code this adds to non-users of the OEM Kconfigs is a big deal. Signed-off-by: Peter Delevoryas Signed-off-by: David S. Miller Stable-dep-of: e85e271dec02 ("net/ncsi: Fix the multi thread manner of NCSI driver") Signed-off-by: Sasha Levin commit 87cc2514162f1d7a9fded97204b8396ef962e0c9 Author: Duoming Zhou Date: Thu May 30 13:17:33 2024 +0800 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() [ Upstream commit 166fcf86cd34e15c7f383eda4642d7a212393008 ] The object "ax25_dev" is managed by reference counting. Thus it should not be directly released by kfree(), replace with ax25_dev_put(). Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Suggested-by: Dan Carpenter Signed-off-by: Duoming Zhou Reviewed-by: Dan Carpenter Link: https://lore.kernel.org/r/20240530051733.11416-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 52100fd74ad07b53a4666feafff1cd11436362d3 Author: Lars Kellogg-Stedman Date: Wed May 29 17:02:43 2024 -0400 ax25: Fix refcount imbalance on inbound connections [ Upstream commit 3c34fb0bd4a4237592c5ecb5b2e2531900c55774 ] When releasing a socket in ax25_release(), we call netdev_put() to decrease the refcount on the associated ax.25 device. However, the execution path for accepting an incoming connection never calls netdev_hold(). This imbalance leads to refcount errors, and ultimately to kernel crashes. A typical call trace for the above situation will start with one of the following errors: refcount_t: decrement hit 0; leaking memory. refcount_t: underflow; use-after-free. And will then have a trace like: Call Trace: ? show_regs+0x64/0x70 ? __warn+0x83/0x120 ? refcount_warn_saturate+0xb2/0x100 ? report_bug+0x158/0x190 ? prb_read_valid+0x20/0x30 ? handle_bug+0x3e/0x70 ? exc_invalid_op+0x1c/0x70 ? asm_exc_invalid_op+0x1f/0x30 ? refcount_warn_saturate+0xb2/0x100 ? refcount_warn_saturate+0xb2/0x100 ax25_release+0x2ad/0x360 __sock_release+0x35/0xa0 sock_close+0x19/0x20 [...] On reboot (or any attempt to remove the interface), the kernel gets stuck in an infinite loop: unregister_netdevice: waiting for ax0 to become free. Usage count = 0 This patch corrects these issues by ensuring that we call netdev_hold() and ax25_dev_hold() for new connections in ax25_accept(). This makes the logic leading to ax25_accept() match the logic for ax25_bind(): in both cases we increment the refcount, which is ultimately decremented in ax25_release(). Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Lars Kellogg-Stedman Tested-by: Duoming Zhou Tested-by: Dan Cross Tested-by: Chris Maness Reviewed-by: Dan Carpenter Link: https://lore.kernel.org/r/20240529210242.3346844-2-lars@oddbit.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 909dc098a75401e33584aaa02ca6b83d12a79098 Author: Quan Zhou Date: Thu May 23 10:13:34 2024 +0800 RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function [ Upstream commit c66f3b40b17d3dfc4b6abb5efde8e71c46971821 ] In the function kvm_riscv_vcpu_set_reg_isa_ext, the original code used incorrect reg_subtype labels KVM_REG_RISCV_SBI_MULTI_EN/DIS. These have been corrected to KVM_REG_RISCV_ISA_MULTI_EN/DIS respectively. Although they are numerically equivalent, the actual processing will not result in errors, but it may lead to ambiguous code semantics. Fixes: 613029442a4b ("RISC-V: KVM: Extend ONE_REG to enable/disable multiple ISA extensions") Signed-off-by: Quan Zhou Reviewed-by: Andrew Jones Link: https://lore.kernel.org/r/ff1c6771a67d660db94372ac9aaa40f51e5e0090.1716429371.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel Signed-off-by: Sasha Levin commit 5d8622f61ef10aa3e43c50ba9f5e912db6f5400e Author: Yong-Xuan Wang Date: Mon Apr 15 14:49:04 2024 +0800 RISC-V: KVM: No need to use mask when hart-index-bit is 0 [ Upstream commit 2d707b4e37f9b0c37b8b2392f91b04c5b63ea538 ] When the maximum hart number within groups is 1, hart-index-bit is set to 0. Consequently, there is no need to restore the hart ID from IMSIC addresses and hart-index-bit settings. Currently, QEMU and kvmtool do not pass correct hart-index-bit values when the maximum hart number is a power of 2, thereby avoiding this issue. Corresponding patches for QEMU and kvmtool will also be dispatched. Fixes: 89d01306e34d ("RISC-V: KVM: Implement device interface for AIA irqchip") Signed-off-by: Yong-Xuan Wang Reviewed-by: Andrew Jones Link: https://lore.kernel.org/r/20240415064905.25184-1-yongxuan.wang@sifive.com Signed-off-by: Anup Patel Signed-off-by: Sasha Levin commit b2b1043ac1f5ff874a5da1dc91d1e6c5136e7a6d Author: Chanwoo Lee Date: Fri May 24 10:59:04 2024 +0900 scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort() [ Upstream commit d53b681ce9ca7db5ef4ecb8d2cf465ae4a031264 ] An error unrelated to ufshcd_try_to_abort_task is being logged and can cause confusion. Modify ufshcd_mcq_abort() to print the result of the abort failure. For readability, return immediately instead of 'goto'. Fixes: f1304d442077 ("scsi: ufs: mcq: Added ufshcd_mcq_abort()") Signed-off-by: Chanwoo Lee Link: https://lore.kernel.org/r/20240524015904.1116005-1-cw9316.lee@samsung.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit 2f467fefdfae366987669e042647f0d7268e98d0 Author: Lingbo Kong Date: Thu May 16 10:18:54 2024 +0800 wifi: mac80211: correctly parse Spatial Reuse Parameter Set element [ Upstream commit a26d8dc5227f449a54518a8b40733a54c6600a8b ] Currently, the way of parsing Spatial Reuse Parameter Set element is incorrect and some members of struct ieee80211_he_obss_pd are not assigned. To address this issue, it must be parsed in the order of the elements of Spatial Reuse Parameter Set defined in the IEEE Std 802.11ax specification. The diagram of the Spatial Reuse Parameter Set element (IEEE Std 802.11ax -2021-9.4.2.252). ------------------------------------------------------------------------- | | | | |Non-SRG| SRG | SRG | SRG | SRG | |Element|Length| Element | SR |OBSS PD|OBSS PD|OBSS PD| BSS |Partial| | ID | | ID |Control| Max | Min | Max |Color | BSSID | | | |Extension| | Offset| Offset|Offset |Bitmap|Bitmap | ------------------------------------------------------------------------- Fixes: 1ced169cc1c2 ("mac80211: allow setting spatial reuse parameters from bss_conf") Signed-off-by: Lingbo Kong Link: https://msgid.link/20240516021854.5682-3-quic_lingbok@quicinc.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit a05018739a5e6b9dc112c95bd4c59904062c8940 Author: Emmanuel Grumbach Date: Mon May 13 13:27:14 2024 +0300 wifi: iwlwifi: mvm: don't read past the mfuart notifcation [ Upstream commit 4bb95f4535489ed830cf9b34b0a891e384d1aee4 ] In case the firmware sends a notification that claims it has more data than it has, we will read past that was allocated for the notification. Remove the print of the buffer, we won't see it by default. If needed, we can see the content with tracing. This was reported by KFENCE. Fixes: bdccdb854f2f ("iwlwifi: mvm: support MFUART dump in case of MFUART assert") Signed-off-by: Emmanuel Grumbach Reviewed-by: Johannes Berg Signed-off-by: Miri Korenblit Link: https://msgid.link/20240513132416.ba82a01a559e.Ia91dd20f5e1ca1ad380b95e68aebf2794f553d9b@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 29a18d56bd64b95bd10bda4afda512558471382a Author: Miri Korenblit Date: Mon May 13 13:27:12 2024 +0300 wifi: iwlwifi: mvm: check n_ssids before accessing the ssids [ Upstream commit 60d62757df30b74bf397a2847a6db7385c6ee281 ] In some versions of cfg80211, the ssids poinet might be a valid one even though n_ssids is 0. Accessing the pointer in this case will cuase an out-of-bound access. Fix this by checking n_ssids first. Fixes: c1a7515393e4 ("iwlwifi: mvm: add adaptive dwell support") Signed-off-by: Miri Korenblit Reviewed-by: Ilan Peer Reviewed-by: Johannes Berg Link: https://msgid.link/20240513132416.6e4d1762bf0d.I5a0e6cc8f02050a766db704d15594c61fe583d45@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit f7773fff6dda917aaca62dd0c19f09febbc31616 Author: Shahar S Matityahu Date: Fri May 10 17:06:39 2024 +0300 wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef [ Upstream commit 87821b67dea87addbc4ab093ba752753b002176a ] The driver should call iwl_dbg_tlv_free even if debugfs is not defined since ini mode does not depend on debugfs ifdef. Fixes: 68f6f492c4fa ("iwlwifi: trans: support loading ini TLVs from external file") Signed-off-by: Shahar S Matityahu Reviewed-by: Luciano Coelho Signed-off-by: Miri Korenblit Link: https://msgid.link/20240510170500.c8e3723f55b0.I5e805732b0be31ee6b83c642ec652a34e974ff10@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit bdfa7cf3281b5af340257273a90bd1219449bc24 Author: Mordechay Goodstein Date: Fri May 10 17:06:35 2024 +0300 wifi: iwlwifi: mvm: set properly mac header [ Upstream commit 0f2e9f6f21d1ff292363cdfb5bc4d492eeaff76e ] In the driver we only use skb_put* for adding data to the skb, hence data never moves and skb_reset_mac_haeder would set mac_header to the first time data was added and not to mac80211 header, fix this my using the actual len of bytes added for setting the mac header. Fixes: 3f7a9d577d47 ("wifi: iwlwifi: mvm: simplify by using SKB MAC header pointer") Signed-off-by: Mordechay Goodstein Reviewed-by: Johannes Berg Signed-off-by: Miri Korenblit Link: https://msgid.link/20240510170500.12f2de2909c3.I72a819b96f2fe55bde192a8fd31a4b96c301aa73@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 1ef2671de723a1b7cdc9967a835433479cde480e Author: Johannes Berg Date: Fri May 10 17:06:33 2024 +0300 wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64 [ Upstream commit 4a7aace2899711592327463c1a29ffee44fcc66e ] We don't actually support >64 even for HE devices, so revert back to 64. This fixes an issue where the session is refused because the queue is configured differently from the actual session later. Fixes: 514c30696fbc ("iwlwifi: add support for IEEE802.11ax") Signed-off-by: Johannes Berg Reviewed-by: Liad Kaufman Reviewed-by: Luciano Coelho Signed-off-by: Miri Korenblit Link: https://msgid.link/20240510170500.52f7b4cf83aa.If47e43adddf7fe250ed7f5571fbb35d8221c7c47@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 1fd3f32695af954f6db04198a37b086d0bb7a3f2 Author: Miri Korenblit Date: Sun May 12 15:25:00 2024 +0300 wifi: iwlwifi: mvm: don't initialize csa_work twice [ Upstream commit 92158790ce4391ce4c35d8dfbce759195e4724cb ] The initialization of this worker moved to iwl_mvm_mac_init_mvmvif but we removed only from the pre-MLD version of the add_interface callback. Remove it also from the MLD version. Fixes: 0bcc2155983e ("wifi: iwlwifi: mvm: init vif works only once") Signed-off-by: Miri Korenblit Reviewed-by: Johannes Berg Link: https://msgid.link/20240512152312.4f15b41604f0.Iec912158e5a706175531d3736d77d25adf02fba4@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit ff2b4dc81e67eb0573500ab8d70056b3142086c7 Author: Lin Ma Date: Tue May 21 15:50:59 2024 +0800 wifi: cfg80211: pmsr: use correct nla_get_uX functions [ Upstream commit ab904521f4de52fef4f179d2dfc1877645ef5f5c ] The commit 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") defines four attributes NL80211_PMSR_FTM_REQ_ATTR_ {NUM_BURSTS_EXP}/{BURST_PERIOD}/{BURST_DURATION}/{FTMS_PER_BURST} in following ways. static const struct nla_policy nl80211_pmsr_ftm_req_attr_policy[NL80211_PMSR_FTM_REQ_ATTR_MAX + 1] = { ... [NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP] = NLA_POLICY_MAX(NLA_U8, 15), [NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD] = { .type = NLA_U16 }, [NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION] = NLA_POLICY_MAX(NLA_U8, 15), [NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST] = NLA_POLICY_MAX(NLA_U8, 31), ... }; That is, those attributes are expected to be NLA_U8 and NLA_U16 types. However, the consumers of these attributes in `pmsr_parse_ftm` blindly all use `nla_get_u32`, which is incorrect and causes functionality issues on little-endian platforms. Hence, fix them with the correct `nla_get_u8` and `nla_get_u16` functions. Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") Signed-off-by: Lin Ma Link: https://msgid.link/20240521075059.47999-1-linma@zju.edu.cn Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 0ccc63958d8373e15a69f4f8069f3e78f7f3898a Author: Remi Pommarel Date: Tue May 21 21:47:26 2024 +0200 wifi: cfg80211: Lock wiphy in cfg80211_get_station [ Upstream commit 642f89daa34567d02f312d03e41523a894906dae ] Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()). This fixes the following kernel NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000 [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] SMP Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705 Hardware name: RPT (r1) (DT) Workqueue: bat_events batadv_v_elp_throughput_metric_update pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core] lr : sta_set_sinfo+0xcc/0xbd4 sp : ffff000007b43ad0 x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98 x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000 x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000 x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000 x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000 x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90 x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000 Call trace: ath10k_sta_statistics+0x10/0x2dc [ath10k_core] sta_set_sinfo+0xcc/0xbd4 ieee80211_get_station+0x2c/0x44 cfg80211_get_station+0x80/0x154 batadv_v_elp_get_throughput+0x138/0x1fc batadv_v_elp_throughput_metric_update+0x1c/0xa4 process_one_work+0x1ec/0x414 worker_thread+0x70/0x46c kthread+0xdc/0xe0 ret_from_fork+0x10/0x20 Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814) This happens because STA has time to disconnect and reconnect before batadv_v_elp_throughput_metric_update() delayed work gets scheduled. In this situation, ath10k_sta_state() can be in the middle of resetting arsta data when the work queue get chance to be scheduled and ends up accessing it. Locking wiphy prevents that. Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API") Signed-off-by: Remi Pommarel Reviewed-by: Nicolas Escande Acked-by: Antonio Quartulli Link: https://msgid.link/983b24a6a176e0800c01aedcd74480d9b551cb13.1716046653.git.repk@triplefau.lt Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 46b7eff59a32438439b403b8886222eb9d044342 Author: Johannes Berg Date: Wed May 22 12:41:25 2024 +0200 wifi: cfg80211: fully move wiphy work to unbound workqueue [ Upstream commit e296c95eac655008d5a709b8cf54d0018da1c916 ] Previously I had moved the wiphy work to the unbound system workqueue, but missed that when it restarts and during resume it was still using the normal system workqueue. Fix that. Fixes: 91d20ab9d9ca ("wifi: cfg80211: use system_unbound_wq for wiphy work") Reviewed-by: Miriam Rachel Korenblit Link: https://msgid.link/20240522124126.7ca959f2cbd3.I3e2a71ef445d167b84000ccf934ea245aef8d395@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 456bbb8a31e425177dc0e8d4f98728a560c20e81 Author: Remi Pommarel Date: Wed May 29 08:57:53 2024 +0200 wifi: mac80211: Fix deadlock in ieee80211_sta_ps_deliver_wakeup() [ Upstream commit 44c06bbde6443de206b30f513100b5670b23fc5e ] The ieee80211_sta_ps_deliver_wakeup() function takes sta->ps_lock to synchronizes with ieee80211_tx_h_unicast_ps_buf() which is called from softirq context. However using only spin_lock() to get sta->ps_lock in ieee80211_sta_ps_deliver_wakeup() does not prevent softirq to execute on this same CPU, to run ieee80211_tx_h_unicast_ps_buf() and try to take this same lock ending in deadlock. Below is an example of rcu stall that arises in such situation. rcu: INFO: rcu_sched self-detected stall on CPU rcu: 2-....: (42413413 ticks this GP) idle=b154/1/0x4000000000000000 softirq=1763/1765 fqs=21206996 rcu: (t=42586894 jiffies g=2057 q=362405 ncpus=4) CPU: 2 PID: 719 Comm: wpa_supplicant Tainted: G W 6.4.0-02158-g1b062f552873 #742 Hardware name: RPT (r1) (DT) pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : queued_spin_lock_slowpath+0x58/0x2d0 lr : invoke_tx_handlers_early+0x5b4/0x5c0 sp : ffff00001ef64660 x29: ffff00001ef64660 x28: ffff000009bc1070 x27: ffff000009bc0ad8 x26: ffff000009bc0900 x25: ffff00001ef647a8 x24: 0000000000000000 x23: ffff000009bc0900 x22: ffff000009bc0900 x21: ffff00000ac0e000 x20: ffff00000a279e00 x19: ffff00001ef646e8 x18: 0000000000000000 x17: ffff800016468000 x16: ffff00001ef608c0 x15: 0010533c93f64f80 x14: 0010395c9faa3946 x13: 0000000000000000 x12: 00000000fa83b2da x11: 000000012edeceea x10: ffff0000010fbe00 x9 : 0000000000895440 x8 : 000000000010533c x7 : ffff00000ad8b740 x6 : ffff00000c350880 x5 : 0000000000000007 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffff00000ac0e0e8 Call trace: queued_spin_lock_slowpath+0x58/0x2d0 ieee80211_tx+0x80/0x12c ieee80211_tx_pending+0x110/0x278 tasklet_action_common.constprop.0+0x10c/0x144 tasklet_action+0x20/0x28 _stext+0x11c/0x284 ____do_softirq+0xc/0x14 call_on_irq_stack+0x24/0x34 do_softirq_own_stack+0x18/0x20 do_softirq+0x74/0x7c __local_bh_enable_ip+0xa0/0xa4 _ieee80211_wake_txqs+0x3b0/0x4b8 __ieee80211_wake_queue+0x12c/0x168 ieee80211_add_pending_skbs+0xec/0x138 ieee80211_sta_ps_deliver_wakeup+0x2a4/0x480 ieee80211_mps_sta_status_update.part.0+0xd8/0x11c ieee80211_mps_sta_status_update+0x18/0x24 sta_apply_parameters+0x3bc/0x4c0 ieee80211_change_station+0x1b8/0x2dc nl80211_set_station+0x444/0x49c genl_family_rcv_msg_doit.isra.0+0xa4/0xfc genl_rcv_msg+0x1b0/0x244 netlink_rcv_skb+0x38/0x10c genl_rcv+0x34/0x48 netlink_unicast+0x254/0x2bc netlink_sendmsg+0x190/0x3b4 ____sys_sendmsg+0x1e8/0x218 ___sys_sendmsg+0x68/0x8c __sys_sendmsg+0x44/0x84 __arm64_sys_sendmsg+0x20/0x28 do_el0_svc+0x6c/0xe8 el0_svc+0x14/0x48 el0t_64_sync_handler+0xb0/0xb4 el0t_64_sync+0x14c/0x150 Using spin_lock_bh()/spin_unlock_bh() instead prevents softirq to raise on the same CPU that is holding the lock. Fixes: 1d147bfa6429 ("mac80211: fix AP powersave TX vs. wakeup race") Signed-off-by: Remi Pommarel Link: https://msgid.link/8e36fe07d0fbc146f89196cd47a53c8a0afe84aa.1716910344.git.repk@triplefau.lt Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 63d5f89bb5664d60edbf8cf0df911aaae8ed96a4 Author: Nicolas Escande Date: Tue May 28 16:26:05 2024 +0200 wifi: mac80211: mesh: Fix leak of mesh_preq_queue objects [ Upstream commit b7d7f11a291830fdf69d3301075dd0fb347ced84 ] The hwmp code use objects of type mesh_preq_queue, added to a list in ieee80211_if_mesh, to keep track of mpath we need to resolve. If the mpath gets deleted, ex mesh interface is removed, the entries in that list will never get cleaned. Fix this by flushing all corresponding items of the preq_queue in mesh_path_flush_pending(). This should take care of KASAN reports like this: unreferenced object 0xffff00000668d800 (size 128): comm "kworker/u8:4", pid 67, jiffies 4295419552 (age 1836.444s) hex dump (first 32 bytes): 00 1f 05 09 00 00 ff ff 00 d5 68 06 00 00 ff ff ..........h..... 8e 97 ea eb 3e b8 01 00 00 00 00 00 00 00 00 00 ....>........... backtrace: [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c [<00000000049bd418>] kmalloc_trace+0x34/0x80 [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8 [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4 [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764 [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4 [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440 [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4 [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508 [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c [<00000000b36425d1>] worker_thread+0x9c/0x634 [<0000000005852dd5>] kthread+0x1bc/0x1c4 [<000000005fccd770>] ret_from_fork+0x10/0x20 unreferenced object 0xffff000009051f00 (size 128): comm "kworker/u8:4", pid 67, jiffies 4295419553 (age 1836.440s) hex dump (first 32 bytes): 90 d6 92 0d 00 00 ff ff 00 d8 68 06 00 00 ff ff ..........h..... 36 27 92 e4 02 e0 01 00 00 58 79 06 00 00 ff ff 6'.......Xy..... backtrace: [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c [<00000000049bd418>] kmalloc_trace+0x34/0x80 [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8 [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4 [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764 [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4 [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440 [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4 [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508 [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c [<00000000b36425d1>] worker_thread+0x9c/0x634 [<0000000005852dd5>] kthread+0x1bc/0x1c4 [<000000005fccd770>] ret_from_fork+0x10/0x20 Fixes: 050ac52cbe1f ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Signed-off-by: Nicolas Escande Link: https://msgid.link/20240528142605.1060566-1-nico.escande@gmail.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin