Project

General

Profile

Bug #23465

"Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no attribute 'split'") in powercycle

Added by Yuri Weinstein about 6 years ago. Updated almost 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
powercycle
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I see latest commit https://github.com/ceph/ceph/commit/c6760eba50860d40e25483c3e4cee772f3ad4468#diff-289c6ff15fd25acee61b31126e02dd06
but unsure how it can break this.

Run: http://pulpito.ceph.com/yuriw-2018-03-26_17:15:42-powercycle-wip_master-yuriw_3.24.18-distro-basic-smithi/
Jobs: '2325406', '2325404', '2325405'
Logs: yuriw-2018-03-26_17:15:42-powercycle-wip_master-yuriw_3.24.18-distro-basic-smithi/2325406/

2018-03-26T17:42:37.491 ERROR:teuthology.run_tasks:Manager failed: internal.archive
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 159, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/internal/__init__.py", line 365, in archive
    fetch_binaries_for_coredumps(path, rem)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/internal/__init__.py", line 312, in fetch_binaries_for_coredumps
    dump_program = dump_out.split("from '")[1].split(' ')[0]
AttributeError: 'tuple' object has no attribute 'split'


Related issues

Related to RADOS - Bug #38594: mimic: common/Mutex.cc: 110: FAILED assert(r == 0) in powercycle New 03/05/2019

History

#1 Updated by Josh Durgin about 6 years ago

This isn't related to that suite commit. Run manually, 'file' returns "remote/smithi150/coredump/1522085413.12350.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'ceph-osd -f --cluster ceph -i 1'" for this core file.

The crash appears to be a race during shutdown:

2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 crt=0'0 active mbc={}] on_shutdown
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] cancel_copy_ops
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] cancel_flush_ops
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] cancel_proxy_ops
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] clear_backoffs 
2018-03-26 17:30:13.185 7f1e4ace4700 20 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] exit NotTrimming
2018-03-26 17:30:13.185 7f1e4ace4700 20 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] enter NotTrimming
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] on_change
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] clear_async_reads
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] clear_primary_state
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] release_backoffs [1:80000000::::head,1:a0000000::::h
ead)
2018-03-26 17:30:13.185 7f1e4ace4700 20 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] agent_stop
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] cancel_recovery
2018-03-26 17:30:13.185 7f1e4ace4700 10 osd.1 pg_epoch: 82 pg[1.1( empty local-lis/les=9/10 n=0 ec=9/9 lis/c 9/9 les/c/f 10/10/0 9/9/9) [0,1] r=1 lpr=10 DELETING crt=0'0 active mbc={}] clear_recovery_state
2018-03-26 17:30:13.185 7f1e3325f700 20 osd.1 op_wq(6) _process empty q, waiting
2018-03-26 17:30:13.185 7f1e35a64700 20 osd.1 op_wq(1) _process empty q, waiting
2018-03-26 17:30:13.185 7f1e3125b700 20 osd.1 op_wq(2) _process empty q, waiting
2018-03-26 17:30:13.185 7f1e31a5c700 20 osd.1 op_wq(1) _process empty q, waiting
2018-03-26 17:30:13.185 7f1e30a5a700 20 osd.1 op_wq(3) _process empty q, waiting
2018-03-26 17:30:13.185 7f1e3ca72700 -1 /build/ceph-13.0.1-3240-gdcc62bb/src/common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f1e3ca72700 time 2018-03-26 17:30:13.186776
/build/ceph-13.0.1-3240-gdcc62bb/src/common/Mutex.cc: 110: FAILED assert(r == 0)

 ceph version 13.0.1-3240-gdcc62bb (dcc62bb2d0243a458251a2c80b510155ad4bfa5e) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f1e525ce602]
 2: (()+0x2ce7d7) [0x7f1e525ce7d7]
 3: (Mutex::Lock(bool)+0x1c3) [0x7f1e525a1833]
 4: (TrackedOp::mark_event(char const*, utime_t)+0x69) [0x55b6b71b60a9]
 5: (ReplicatedBackend::op_commit(ReplicatedBackend::InProgressOp*)+0x6d) [0x55b6b720ea0d]
 6: (Context::complete(int)+0x9) [0x55b6b6f77ef9]
 7: (PrimaryLogPG::BlessedContext::finish(int)+0x56) [0x55b6b70f4c66]
 8: (Context::complete(int)+0x9) [0x55b6b6f77ef9]
 9: (Finisher::finisher_thread_entry()+0x1a7) [0x7f1e525ccc67]
 10: (()+0x76ba) [0x7f1e5108d6ba]
 11: (clone()+0x6d) [0x7f1e508b641d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#2 Updated by Yuri Weinstein about 6 years ago

  • Subject changed from "AttributeError: 'tuple' object has no attribute 'split'" in powercycle to "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no attribute 'split'") in powercycle

#3 Updated by Greg Farnum almost 6 years ago

  • Project changed from Ceph to RADOS

#4 Updated by Josh Durgin almost 6 years ago

  • Assignee deleted (Sage Weil)
  • Priority changed from Urgent to Normal

#5 Updated by Neha Ojha about 5 years ago

  • Related to Bug #38594: mimic: common/Mutex.cc: 110: FAILED assert(r == 0) in powercycle added

Also available in: Atom PDF