Project

General

Profile

Bug #46284

softlockup in ceph_check_delayed_caps

Added by Jeff Layton over 2 years ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Patrick caught this in teuthology (https://pulpito.ceph.com/pdonnell-2020-06-29_23:23:33-multimds-wip-pdonnell-testing-20200629.214056-distro-basic-smithi/5190645/):

[ 8105.729195] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kworker/7:2:25281]
[ 8105.737350] Modules linked in: ceph libceph dns_resolver fscache veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter nf_tables nfnetlink bridge stp llc xfs libcrc32c sunrpc intel_rapl_msr intel_
rapl_common iTCO_wdt iTCO_vendor_support sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul ipmi_ssif joydev crc32_pclmul mei_me lpc_ich ghash_clmulni_intel i2c_i801 mei mfd_core i2c_smbus ioatdma acpi_ipmi wmi ipmi_si ipmi
_devintf ipmi_msghandler acpi_pad acpi_power_meter ixgbe igb mdio ptp crc32c_intel nvme i2c_algo_bit pps_core nvme_core dca
[ 8105.794266] irq event stamp: 3617332
[ 8105.798203] hardirqs last  enabled at (3617331): [<ffffffff81c00cc2>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 8105.808941] hardirqs last disabled at (3617332): [<ffffffff819f0832>] idtentry_enter_cond_rcu+0x22/0x60
[ 8105.818979] softirqs last  enabled at (3563636): [<ffffffff81e00346>] __do_softirq+0x346/0x454
[ 8105.828231] softirqs last disabled at (3563615): [<ffffffff81c010af>] asm_call_on_stack+0xf/0x20
[ 8105.837663] CPU: 7 PID: 25281 Comm: kworker/7:2 Not tainted 5.8.0-rc1-ceph-g556853900c89 #1
[ 8105.846654] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
[ 8105.854717] Workqueue: events delayed_work [ceph]
[ 8105.859782] RIP: 0010:lock_release+0x130/0x270
[ 8105.864580] Code: fa ff ff 0b 05 89 a5 65 02 0f 84 d1 00 00 00 65 48 8b 04 25 00 7e 01 00 83 a8 44 09 00 00 01 0f 85 02 01 00 00 48 89 ef 57 9d <0f> 1f 44 00 00 48 8b 44 24 10 65 48 33 04 25 28 00 00 00 0f 85 1e
[ 8105.884294] RSP: 0018:ffffc90003ff3d80 EFLAGS: 00000246
[ 8105.889880] RAX: ffff888854e30000 RBX: ffff888854e30000 RCX: ffffc90003ff3d8c
[ 8105.897380] RDX: 0000000000000002 RSI: ffff888839dcc408 RDI: 0000000000000246
[ 8105.904883] RBP: 0000000000000246 R08: 0000000000000000 R09: 32124568c4c46b48
[ 8105.912380] R10: 0000000000000000 R11: ffff888854e30948 R12: ffff888839dcc408
[ 8105.919874] R13: ffffffffa07d5783 R14: 0000000000000003 R15: ffff8887416e7a10
[ 8105.927375] FS:  0000000000000000(0000) GS:ffff88885fdc0000(0000) knlGS:0000000000000000
[ 8105.936102] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8105.942204] CR2: 00007f9eadfec960 CR3: 0000000002610002 CR4: 00000000003606e0
[ 8105.949706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8105.957203] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8105.964699] Call Trace:
[ 8105.967510]  _raw_spin_unlock+0x17/0x30
[ 8105.971713]  ceph_check_delayed_caps+0x83/0x130 [ceph]
[ 8105.977224]  delayed_work+0x18b/0x2d0 [ceph]
[ 8105.981857]  process_one_work+0x242/0x580
[ 8105.986226]  ? worker_thread+0xc1/0x390
[ 8105.990430]  worker_thread+0x3c/0x390
[ 8105.994453]  ? process_one_work+0x580/0x580
[ 8105.998995]  kthread+0x133/0x150
[ 8106.002576]  ? kthread_unpark+0x60/0x60
[ 8106.006767]  ret_from_fork+0x1f/0x30

I wonder if ceph_check_delayed_caps could be subject to livelock if there are continually new entries being put onto the list. Possibly we need to cap its duty cycle?

History

#1 Updated by Jeff Layton over 2 years ago

More of the log:

[ 8133.729433] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kworker/7:2:25281]
[ 8133.737568] Modules linked in: ceph libceph dns_resolver fscache veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter nf_tables
 nfnetlink bridge stp llc xfs libcrc32c sunrpc intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pc
lmul ipmi_ssif joydev crc32_pclmul mei_me lpc_ich ghash_clmulni_intel i2c_i801 mei mfd_core i2c_smbus ioatdma acpi_ipmi wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_me
ter ixgbe igb mdio ptp crc32c_intel nvme i2c_algo_bit pps_core nvme_core dca
[ 8133.794492] irq event stamp: 3672878
[ 8133.798428] hardirqs last  enabled at (3672877): [<ffffffff81c00cc2>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 8133.809171] hardirqs last disabled at (3672878): [<ffffffff819f0832>] idtentry_enter_cond_rcu+0x22/0x60
[ 8133.819213] softirqs last  enabled at (3563636): [<ffffffff81e00346>] __do_softirq+0x346/0x454
[ 8133.828474] softirqs last disabled at (3563615): [<ffffffff81c010af>] asm_call_on_stack+0xf/0x20
[ 8133.837905] CPU: 7 PID: 25281 Comm: kworker/7:2 Tainted: G             L    5.8.0-rc1-ceph-g556853900c89 #1
[ 8133.848289] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
[ 8133.856341] Workqueue: events delayed_work [ceph]
[ 8133.861409] RIP: 0010:igrab+0x29/0x50
[ 8133.865427] Code: 00 0f 1f 44 00 00 55 48 8d af 88 00 00 00 53 48 89 fb 48 89 ef e8 57 7b 6c 00 f6 83 d8 00 00 00 30 75 15 f0 ff 83 08 02 00 00 <48> 89 ef e8 5f 7d 6c 00 48 89 d8 5b 5d c3 48 89 ef e8 51 7d 6c 00
[ 8133.885142] RSP: 0018:ffffc90003ff3dc0 EFLAGS: 00000206
[ 8133.898649] RAX: ffff888854e30000 RBX: ffff8887416d3b10 RCX: ffffffff83447fd0
[ 8133.906145] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8887416d3b98
[ 8133.913645] RBP: ffff8887416d3b98 R08: 0000000000000001 R09: 32124568c4c46b48
[ 8133.921141] R10: 0000000000000000 R11: ffff888854e30948 R12: ffff888839dcc3f0
[ 8133.928637] R13: ffff888839dcc3e0 R14: ffffffffa07efbff R15: ffff8887416e7a10
[ 8133.936133] FS:  0000000000000000(0000) GS:ffff88885fdc0000(0000) knlGS:0000000000000000
[ 8133.944863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8133.950973] CR2: 00007f9eadfec960 CR3: 0000000002610002 CR4: 00000000003606e0
[ 8133.958470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8133.965965] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8133.973461] Call Trace:
[ 8133.976278]  ceph_check_delayed_caps+0x78/0x130 [ceph]
[ 8133.981793]  delayed_work+0x18b/0x2d0 [ceph]
[ 8133.986439]  process_one_work+0x242/0x580
[ 8133.990809]  ? worker_thread+0xc1/0x390
[ 8133.995011]  worker_thread+0x3c/0x390
[ 8133.999034]  ? process_one_work+0x580/0x580
[ 8134.003577]  kthread+0x133/0x150
[ 8134.007159]  ? kthread_unpark+0x60/0x60
[ 8134.011353]  ret_from_fork+0x1f/0x30
[ 8138.340320] libceph: mon1 (1)172.21.15.137:6789 session lost, hunting for new mon
[ 8138.348739] ceph: mds0 caps still stale
[ 8138.353044] ceph: mds2 caps still stale
[ 8138.357250] ceph: mds3 caps still stale
[ 8138.361513] ceph: mds5 caps still stale
[ 8138.365715] ceph: mds4 caps still stale
[ 8138.369911] ceph: mds6 caps still stale
[ 8138.374105] ceph: mds7 caps still stale
[ 8138.378301] ceph: mds8 caps still stale
[ 8138.382493] ceph: mds1 caps still stale
[ 8138.428953] libceph: mon0 (1)172.21.15.101:6789 session established
[ 8145.406621] ceph: mds0 caps stale
[ 8145.410427] ceph: mds1 caps stale
[ 8145.414150] ceph: mds2 caps stale
[ 8145.418018] ceph: mds3 caps stale
[ 8145.421724] ceph: mds4 caps stale
[ 8145.425426] ceph: mds5 caps stale
[ 8145.429129] ceph: mds6 caps stale
[ 8145.432903] ceph: mds7 caps stale
[ 8145.436614] ceph: mds8 caps stale
[ 8145.442509] ceph: mds4 caps renewed
[ 8145.446500] ceph: mds7 caps renewed
[ 8145.450374] ceph: mds6 caps renewed
[ 8145.454275] ceph: mds3 caps renewed
[ 8145.458187] ceph: mds0 caps renewed
[ 8145.462042] ceph: mds8 caps renewed
[ 8145.465905] ceph: mds5 caps renewed
[ 8145.469772] ceph: mds1 caps renewed
[ 8145.473638] ceph: mds2 caps renewed
[ 9387.328654] ceph-brx: port 1(brx.0) entered disabled state
[ 9387.459583] device brx.0 left promiscuous mode
[ 9387.464415] ceph-brx: port 1(brx.0) entered disabled state
[ 9389.614869] ceph-brx: port 2(brx.1) entered disabled state
[ 9389.731139] device brx.1 left promiscuous mode
[ 9389.736160] ceph-brx: port 2(brx.1) entered disabled state

Was there a network problem here that could have contributed to a softlockup? We lost contact with the mon at that time. We probably need to do some work down in the bowels of the cap handling to avoid locking up when there are, in any case.

#2 Updated by Jeff Layton over 2 years ago

git blame:

afcdaea3f2a78 (Sage Weil          2009-10-14 14:27:38 -0700 4186) void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4187) {
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4188)       struct inode *inode;
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4189)       struct ceph_inode_info *ci;
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4190) 
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4191)       dout("check_delayed_caps\n");
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4192)       while (1) {
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4193)               spin_lock(&mdsc->cap_delay_lock);
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4194)               if (list_empty(&mdsc->cap_delay_list))
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4195)                       break;
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4196)               ci = list_first_entry(&mdsc->cap_delay_list,
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4197)                                     struct ceph_inode_info,
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4198)                                     i_cap_delay_list);
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4199)               if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4200)                   time_before(jiffies, ci->i_hold_caps_max))
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4201)                       break;
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4202)               list_del_init(&ci->i_cap_delay_list);
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4203) 
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4204)               inode = igrab(&ci->vfs_inode);
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4205)               spin_unlock(&mdsc->cap_delay_lock);
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4206) 
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4207)               if (inode) {
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4208)                       dout("check_delayed_caps on %p\n", inode);
a0d93e327fc42 (Yan, Zheng         2020-03-05 20:21:01 +0800 4209)                       ceph_check_caps(ci, 0, NULL);
3e1d0452edcee (Yan, Zheng         2019-05-18 20:39:55 +0800 4210)                       /* avoid calling iput_final() in tick thread */
3e1d0452edcee (Yan, Zheng         2019-05-18 20:39:55 +0800 4211)                       ceph_async_iput(inode);
4b9f2042fd2a9 (Yan, Zheng         2017-06-27 17:17:24 +0800 4212)               }
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4213)       }
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4214)       spin_unlock(&mdsc->cap_delay_lock);
a8599bd821d08 (Sage Weil          2009-10-06 11:31:12 -0700 4215) }

What I suspect happened is that we ended up calling __cap_delay_requeue in ceph_check_caps. That would requeue the cap back to the list after it has been dequeued. I think we may need a livelock avoidance mechanism here.

#3 Updated by Jeff Layton over 1 year ago

A patch for this has now been merged into the testing branch. See:

e869cbcedf1a ceph: reduce contention in ceph_check_delayed_caps()

#4 Updated by Jeff Layton over 1 year ago

  • Status changed from New to Resolved

Also available in: Atom PDF