Project

General

Profile

Actions

Bug #46284

closed

softlockup in ceph_check_delayed_caps

Added by Jeff Layton almost 4 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Patrick caught this in teuthology (https://pulpito.ceph.com/pdonnell-2020-06-29_23:23:33-multimds-wip-pdonnell-testing-20200629.214056-distro-basic-smithi/5190645/):

[ 8105.729195] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kworker/7:2:25281]
[ 8105.737350] Modules linked in: ceph libceph dns_resolver fscache veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter nf_tables nfnetlink bridge stp llc xfs libcrc32c sunrpc intel_rapl_msr intel_
rapl_common iTCO_wdt iTCO_vendor_support sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul ipmi_ssif joydev crc32_pclmul mei_me lpc_ich ghash_clmulni_intel i2c_i801 mei mfd_core i2c_smbus ioatdma acpi_ipmi wmi ipmi_si ipmi
_devintf ipmi_msghandler acpi_pad acpi_power_meter ixgbe igb mdio ptp crc32c_intel nvme i2c_algo_bit pps_core nvme_core dca
[ 8105.794266] irq event stamp: 3617332
[ 8105.798203] hardirqs last  enabled at (3617331): [<ffffffff81c00cc2>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 8105.808941] hardirqs last disabled at (3617332): [<ffffffff819f0832>] idtentry_enter_cond_rcu+0x22/0x60
[ 8105.818979] softirqs last  enabled at (3563636): [<ffffffff81e00346>] __do_softirq+0x346/0x454
[ 8105.828231] softirqs last disabled at (3563615): [<ffffffff81c010af>] asm_call_on_stack+0xf/0x20
[ 8105.837663] CPU: 7 PID: 25281 Comm: kworker/7:2 Not tainted 5.8.0-rc1-ceph-g556853900c89 #1
[ 8105.846654] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
[ 8105.854717] Workqueue: events delayed_work [ceph]
[ 8105.859782] RIP: 0010:lock_release+0x130/0x270
[ 8105.864580] Code: fa ff ff 0b 05 89 a5 65 02 0f 84 d1 00 00 00 65 48 8b 04 25 00 7e 01 00 83 a8 44 09 00 00 01 0f 85 02 01 00 00 48 89 ef 57 9d <0f> 1f 44 00 00 48 8b 44 24 10 65 48 33 04 25 28 00 00 00 0f 85 1e
[ 8105.884294] RSP: 0018:ffffc90003ff3d80 EFLAGS: 00000246
[ 8105.889880] RAX: ffff888854e30000 RBX: ffff888854e30000 RCX: ffffc90003ff3d8c
[ 8105.897380] RDX: 0000000000000002 RSI: ffff888839dcc408 RDI: 0000000000000246
[ 8105.904883] RBP: 0000000000000246 R08: 0000000000000000 R09: 32124568c4c46b48
[ 8105.912380] R10: 0000000000000000 R11: ffff888854e30948 R12: ffff888839dcc408
[ 8105.919874] R13: ffffffffa07d5783 R14: 0000000000000003 R15: ffff8887416e7a10
[ 8105.927375] FS:  0000000000000000(0000) GS:ffff88885fdc0000(0000) knlGS:0000000000000000
[ 8105.936102] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8105.942204] CR2: 00007f9eadfec960 CR3: 0000000002610002 CR4: 00000000003606e0
[ 8105.949706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8105.957203] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8105.964699] Call Trace:
[ 8105.967510]  _raw_spin_unlock+0x17/0x30
[ 8105.971713]  ceph_check_delayed_caps+0x83/0x130 [ceph]
[ 8105.977224]  delayed_work+0x18b/0x2d0 [ceph]
[ 8105.981857]  process_one_work+0x242/0x580
[ 8105.986226]  ? worker_thread+0xc1/0x390
[ 8105.990430]  worker_thread+0x3c/0x390
[ 8105.994453]  ? process_one_work+0x580/0x580
[ 8105.998995]  kthread+0x133/0x150
[ 8106.002576]  ? kthread_unpark+0x60/0x60
[ 8106.006767]  ret_from_fork+0x1f/0x30

I wonder if ceph_check_delayed_caps could be subject to livelock if there are continually new entries being put onto the list. Possibly we need to cap its duty cycle?

Actions

Also available in: Atom PDF