Actions
Bug #63324
openkclient: watchdog: BUG: soft lockup - CPU#11 stuck for 25738s! [kworker/11:1:1724873]
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
The kworker seems fall into a infinetely loop when iterating the mdsc->snap_flush_list list:
56233 2023-10-25T14:15:38.879533+0000 kernel: watchdog: BUG: soft lockup - CPU#11 stuck for 25712s! [kworker/11:1:1724873] 56234 2023-10-25T14:15:38.879724+0000 kernel: Modules linked in: ceph fscache netfs xt_TCPMSS xt_set ip_set_hash_net ip_set nbd rbd libceph dns_resolver xt_addrtype veth xt_mark nf_conntrack_netlink geneve ip6_udp_tunnel udp_tunnel ipt_REJECT nf_reject_ipv4 xt_nat nft_chain_nat xt_CT xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink_cttimeout nfnetlink rfkill openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ov erlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common intel_uncore_frequency_common vmwgfx isst_if_mbox_msr isst_if_common drm_ttm_helper ttm nfit libnvdimm drm_kms_helper rapl vmw_balloon syscopyarea sysfillrect sysimgblt pcspkr vmw_vmci fb_sys_fops i2c_piix4 joydev drm ip_tables xfs libcrc32c dm_multipath ata_generic sd_mod nvme_tcp sg nvme_fabrics nvme ata_piix crct10dif_pclmul nvme_core crc32_pclmul crc32c_intel libata nvme_common vmxnet3 ghash_clmulni_intel t10_pi vmw_pvscsi serio_raw 56235 2023-10-25T14:15:38.898655+0000 kernel: dm_mirror dm_region_hash dm_log dm_mod fuse 56236 2023-10-25T14:15:38.898743+0000 kernel: CPU: 11 PID: 1724873 Comm: kworker/11:1 Tainted: G L -------- --- 5.14.0-284.36.1.el9_2.x86_64 #1 56237 2023-10-25T14:15:38.898774+0000 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 56238 2023-10-25T14:15:38.898799+0000 kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph] 56239 2023-10-25T14:15:38.898825+0000 kernel: RIP: 0010:_raw_spin_lock+0x17/0x30 56240 2023-10-25T14:15:38.898863+0000 kernel: Code: cc cc e9 bc f1 62 ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 65 ff 05 34 24 4e 58 31 c0 ba 01 00 00 00 f0 0f b1 17 <75> 05 c3 cc cc cc cc 89 c6 e8 2b ed 62 ff 66 90 c3 cc cc cc cc 0f 56241 2023-10-25T14:15:38.898892+0000 kernel: RSP: 0018:ffffb90d1c25fbf8 EFLAGS: 00000246 56242 2023-10-25T14:15:38.898921+0000 kernel: RAX: 0000000000000000 RBX: ffff973c71dc5f80 RCX: 00000000802a0028 56243 2023-10-25T14:15:38.898952+0000 kernel: RDX: 0000000000000001 RSI: ffffb90d1c25fc78 RDI: ffff973c71dc6230 56244 2023-10-25T14:15:38.898982+0000 kernel: RBP: 0000000000000000 R08: ffff973c71dc6518 R09: ffffb90d1c25fbe0 56245 2023-10-25T14:15:38.899029+0000 kernel: R10: 0000000000000000 R11: 0000000000002e4c R12: ffff973c71dc6230 56246 2023-10-25T14:15:38.899077+0000 kernel: R13: ffff97366aae4808 R14: ffff97366aae4800 R15: 0000000000000002 56247 2023-10-25T14:15:38.899148+0000 kernel: FS: 0000000000000000(0000) GS:ffff973e1fcc0000(0000) knlGS:0000000000000000 56248 2023-10-25T14:15:38.899176+0000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 56249 2023-10-25T14:15:38.899208+0000 kernel: CR2: 000000c0008c9ae0 CR3: 0000000140264003 CR4: 00000000007706e0 56250 2023-10-25T14:15:38.899234+0000 kernel: PKRU: 55555554 56251 2023-10-25T14:15:38.899257+0000 kernel: Call Trace: 56252 2023-10-25T14:15:38.899283+0000 kernel: <TASK> 56253 2023-10-25T14:15:38.899305+0000 kernel: ceph_flush_snaps+0x4f/0x1e0 [ceph] 56254 2023-10-25T14:15:38.899330+0000 kernel: ceph_handle_snap+0x2c2/0x4f0 [ceph] 56255 2023-10-25T14:15:38.899365+0000 kernel: ? tcp_recvmsg_locked+0x200/0x900 56256 2023-10-25T14:15:38.899394+0000 kernel: mds_dispatch+0x19d/0x1e0 [ceph] 56257 2023-10-25T14:15:38.899416+0000 kernel: ceph_con_process_message+0x77/0x150 [libceph] 56258 2023-10-25T14:15:38.899440+0000 kernel: ? release_sock+0x40/0x90 56259 2023-10-25T14:15:38.899465+0000 kernel: process_message+0xf/0x110 [libceph] 56260 2023-10-25T14:15:38.899497+0000 kernel: populate_in_iter+0x45e/0x660 [libceph] 56261 2023-10-25T14:15:38.899522+0000 kernel: ? inet_recvmsg+0x5c/0x140 56262 2023-10-25T14:15:38.899551+0000 kernel: ceph_con_v2_try_read+0x101/0x210 [libceph] 56263 2023-10-25T14:15:38.899606+0000 kernel: ? ceph_con_v2_try_write+0x1aa/0x4c0 [libceph] 56264 2023-10-25T14:15:38.899632+0000 kernel: ceph_con_workfn+0x175/0x440 [libceph] 56265 2023-10-25T14:15:38.899657+0000 kernel: process_one_work+0x1e2/0x3b0 56266 2023-10-25T14:15:38.899681+0000 kernel: ? rescuer_thread+0x390/0x390 56267 2023-10-25T14:15:38.899714+0000 kernel: worker_thread+0x50/0x3a0 56268 2023-10-25T14:15:38.899739+0000 kernel: ? rescuer_thread+0x390/0x390 56269 2023-10-25T14:15:38.899762+0000 kernel: kthread+0xd6/0x100 56270 2023-10-25T14:15:38.899793+0000 kernel: ? kthread_complete_and_exit+0x20/0x20 56271 2023-10-25T14:15:38.899819+0000 kernel: ret_from_fork+0x1f/0x30 56272 2023-10-25T14:15:38.899842+0000 kernel: </TASK> 56273 2023-10-25T14:16:06.879512+0000 kernel: watchdog: BUG: soft lockup - CPU#11 stuck for 25738s! [kworker/11:1:1724873] 56274 2023-10-25T14:16:06.879675+0000 kernel: Modules linked in: ceph fscache netfs xt_TCPMSS xt_set ip_set_hash_net ip_set nbd rbd libceph dns_resolver xt_addrtype veth xt_mark nf_conntrack_netlink geneve ip6_udp_tunnel udp_tunnel ipt_REJECT nf_reject_ipv4 xt_nat nft_chain_nat xt_CT xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink_cttimeout nfnetlink rfkill openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ov erlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common intel_uncore_frequency_common vmwgfx isst_if_mbox_msr isst_if_common drm_ttm_helper ttm nfit libnvdimm drm_kms_helper rapl vmw_balloon syscopyarea sysfillrect sysimgblt pcspkr vmw_vmci fb_sys_fops i2c_piix4 joydev drm ip_tables xfs libcrc32c dm_multipath ata_generic sd_mod nvme_tcp sg nvme_fabrics nvme ata_piix crct10dif_pclmul nvme_core crc32_pclmul crc32c_intel libata nvme_common vmxnet3 ghash_clmulni_intel t10_pi vmw_pvscsi serio_raw 56275 2023-10-25T14:16:06.879774+0000 kernel: dm_mirror dm_region_hash dm_log dm_mod fuse 56276 2023-10-25T14:16:06.879822+0000 kernel: CPU: 11 PID: 1724873 Comm: kworker/11:1 Tainted: G L -------- --- 5.14.0-284.36.1.el9_2.x86_64 #1 56277 2023-10-25T14:16:06.879850+0000 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 56278 2023-10-25T14:16:06.879871+0000 kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph] 56279 2023-10-25T14:16:06.879895+0000 kernel: RIP: 0010:_raw_spin_unlock+0x1f/0x30 56280 2023-10-25T14:16:06.879919+0000 kernel: Code: e9 26 7e 5c ff 66 0f 1f 44 00 00 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 65 ff 0d 7d 23 4e 58 74 05 c3 cc cc cc cc 0f 1f 44 00 00 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 56281 2023-10-25T14:16:06.879947+0000 kernel: RSP: 0018:ffffb90d1c25fc38 EFLAGS: 00000246 56282 2023-10-25T14:16:06.879970+0000 kernel: RAX: 0000000000000002 RBX: ffff973749586000 RCX: 00000000802a0028 56283 2023-10-25T14:16:06.879997+0000 kernel: RDX: 0000000000000001 RSI: ffff973c71dc6008 RDI: ffff9737495861c8 56284 2023-10-25T14:16:06.880022+0000 kernel: RBP: ffff973c71dc5f80 R08: ffff973c71dc6518 R09: ffffb90d1c25fbe0 56285 2023-10-25T14:16:06.880059+0000 kernel: R10: 0000000000000000 R11: 0000000000002e4c R12: ffff9737495861c8 56286 2023-10-25T14:16:06.880098+0000 kernel: R13: ffff9737495861b8 R14: ffff973645342600 R15: 0000000000000002 56287 2023-10-25T14:16:06.880137+0000 kernel: FS: 0000000000000000(0000) GS:ffff973e1fcc0000(0000) knlGS:0000000000000000 56288 2023-10-25T14:16:06.880177+0000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 56289 2023-10-25T14:16:06.880204+0000 kernel: CR2: 000000c0008c9ae0 CR3: 0000000140264003 CR4: 00000000007706e0 56290 2023-10-25T14:16:06.880235+0000 kernel: PKRU: 55555554 56291 2023-10-25T14:16:06.880263+0000 kernel: Call Trace: 56292 2023-10-25T14:16:06.880306+0000 kernel: <TASK> 56293 2023-10-25T14:16:06.880331+0000 kernel: ceph_handle_snap+0x2b5/0x4f0 [ceph] 56294 2023-10-25T14:16:06.880352+0000 kernel: ? tcp_recvmsg_locked+0x200/0x900 56295 2023-10-25T14:16:06.880378+0000 kernel: mds_dispatch+0x19d/0x1e0 [ceph] 56296 2023-10-25T14:16:06.880401+0000 kernel: ceph_con_process_message+0x77/0x150 [libceph] 56297 2023-10-25T14:16:06.880423+0000 kernel: ? release_sock+0x40/0x90 56298 2023-10-25T14:16:06.880444+0000 kernel: process_message+0xf/0x110 [libceph] 56299 2023-10-25T14:16:06.880468+0000 kernel: populate_in_iter+0x45e/0x660 [libceph] 56300 2023-10-25T14:16:06.880492+0000 kernel: ? inet_recvmsg+0x5c/0x140 56301 2023-10-25T14:16:06.880517+0000 kernel: ceph_con_v2_try_read+0x101/0x210 [libceph] 56302 2023-10-25T14:16:06.880539+0000 kernel: ? ceph_con_v2_try_write+0x1aa/0x4c0 [libceph] 56303 2023-10-25T14:16:06.880575+0000 kernel: ceph_con_workfn+0x175/0x440 [libceph] 56304 2023-10-25T14:16:06.880611+0000 kernel: process_one_work+0x1e2/0x3b0 56305 2023-10-25T14:16:06.880635+0000 kernel: ? rescuer_thread+0x390/0x390 56306 2023-10-25T14:16:06.880659+0000 kernel: worker_thread+0x50/0x3a0 56307 2023-10-25T14:16:06.880679+0000 kernel: ? rescuer_thread+0x390/0x390 56308 2023-10-25T14:16:06.880700+0000 kernel: kthread+0xd6/0x100 56309 2023-10-25T14:16:06.880722+0000 kernel: ? kthread_complete_and_exit+0x20/0x20 56310 2023-10-25T14:16:06.880742+0000 kernel: ret_from_fork+0x1f/0x30 56311 2023-10-25T14:16:06.880763+0000 kernel: </TASK>
The code:
937 /* 938 * Send any cap_snaps that are queued for flush. Try to carry 939 * s_mutex across multiple snap flushes to avoid locking overhead. 940 * 941 * Caller holds no locks. 942 */ 943 static void flush_snaps(struct ceph_mds_client *mdsc) 944 { 945 struct ceph_client *cl = mdsc->fsc->client; 946 struct ceph_inode_info *ci; 947 struct inode *inode; 948 struct ceph_mds_session *session = NULL; 949 950 doutc(cl, "begin\n"); 951 spin_lock(&mdsc->snap_flush_lock); 952 while (!list_empty(&mdsc->snap_flush_list)) { 953 ci = list_first_entry(&mdsc->snap_flush_list, 954 struct ceph_inode_info, i_snap_flush_item); 955 inode = &ci->netfs.inode; 956 ihold(inode); 957 spin_unlock(&mdsc->snap_flush_lock); 958 ceph_flush_snaps(ci, &session); 959 iput(inode); 960 spin_lock(&mdsc->snap_flush_lock); 961 } 962 spin_unlock(&mdsc->snap_flush_lock); 963 964 ceph_put_mds_session(session); 965 doutc(cl, "done\n"); 966 }
This is a similar issue with https://tracker.ceph.com/issues/63298.
No data to display
Actions