Project

General

Profile

Actions

Bug #9796

closed

osd: crash on blacklisted watcher reconnect (dumpling)

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  -414> 2014-10-16 05:12:14.251271 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] do_pending_flush done
  -413> 2014-10-16 05:12:14.251279 7f3353d02700 20 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] op_has_sufficient_caps pool=2 (rbd ) owner=0 need_read_cap=0 need_write_cap=1 need_class_read_cap=0 need_class_write_cap=0 -> yes
  -412> 2014-10-16 05:12:14.251288 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] do_op osd_op(client.4439.0:380 rbd_header.11102ae8944a [watch add cookie 1 ver 0] 2.d791216c e68) v4 may_write -> write-ordered
  -411> 2014-10-16 05:12:14.251304 7f3353d02700  7 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] degraded d791216c/rbd_header.11102ae8944a/head//2, pushing
  -410> 2014-10-16 05:12:14.251314 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] prep_object_replica_pushes: on d791216c/rbd_header.11102ae8944a/head//2
  -409> 2014-10-16 05:12:14.251323 7f3353d02700 15 filestore(/var/lib/ceph/osd/ceph-0) getattr 2.c_head/d791216c/rbd_header.11102ae8944a/head//2 '_'
  -408> 2014-10-16 05:12:14.251370 7f3353d02700 10 filestore(/var/lib/ceph/osd/ceph-0) getattr 2.c_head/d791216c/rbd_header.11102ae8944a/head//2 '_' = 0
  -407> 2014-10-16 05:12:14.251386 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] populate_obc_watchers d791216c/rbd_header.11102ae8944a/head//2
  -406> 2014-10-16 05:12:14.251395 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active]   unconnected watcher 1,client.4439 will expire 30.000000
  -405> 2014-10-16 05:12:14.251404 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active]  -- Watch(1,client.4439, obc->ref=1) Watch()
  -404> 2014-10-16 05:12:14.251412 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active]  -- Watch(1,client.4439, obc->ref=1) disconnect
  -403> 2014-10-16 05:12:14.251418 7f3353d02700 15 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active]  -- Watch(1,client.4439, obc->ref=1) registering callback, timeout: 
30
  -402> 2014-10-16 05:12:14.251443 7f3353d02700 20 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] ReplicatedPG::check_blacklisted_obc_watchers for obc d791216c/rbd_he
ader.11102ae8944a/head//2
  -401> 2014-10-16 05:12:14.251452 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] watch: Found blacklisted watcher for 10.214.138.122:0/1008809
  -400> 2014-10-16 05:12:14.251460 7f3353d02700 10 osd.0 pg_epoch: 69 pg[2.c( v 67'28 (0'0,67'28] local-les=69 n=2 ec=1 les/c 53/53 68/68/68) [0,5] r=0 lpr=68 pi=1-67/7 lcod 67'27 mlcod 0'0 active] handle_watch_timeout obc 0x302a000

  -193> 2014-10-16 05:12:14.270614 7f3353d02700 -1 osd/Watch.cc: In function 'Context* Watch::get_delayed_cb()' thread 7f3353d02700 time 2014-10-16 05:12:14.251468
osd/Watch.cc: 290: FAILED assert(!cb)

 ceph version 0.67.11-24-g06a62db (06a62dbb552981f86e58d83c58eeb254d633e74a)
 1: (Watch::get_delayed_cb()+0xc3) [0x6d3ec3]
 2: (ReplicatedPG::handle_watch_timeout(std::tr1::shared_ptr<Watch>)+0x9e9) [0x5e54d9]
 3: (ReplicatedPG::check_blacklisted_obc_watchers(ObjectContext*)+0x3ba) [0x5e5b4a]
 4: (ReplicatedPG::populate_obc_watchers(ObjectContext*)+0x60b) [0x5e642b]
 5: (ReplicatedPG::get_object_context(hobject_t const&, bool)+0x1c4) [0x5e6dd4]
 6: (ReplicatedPG::prep_object_replica_pushes(hobject_t const&, eversion_t, int, std::map<int, std::vector<PushOp, std::allocator<PushOp> >, std::less<int>, std::allocator<std::pair<int const, std::vector<PushOp, std::allocator<PushOp> > > > >*)+0x10e) [0x5f707e]
 7: (ReplicatedPG::wait_for_degraded_object(hobject_t const&, std::tr1::shared_ptr<OpRequest>)+0x1cc) [0x5f8c7c]
 8: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x7f9) [0x60a9d9]
 9: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x619) [0x6fedc9]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x330) [0x651ee0]
 11: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x478) [0x668b88]
 12: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6a464c]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b6b56]
 14: (ThreadPool::WorkThread::entry()+0x10) [0x8b8b70]
 15: (()+0x7e9a) [0x7f33681c4e9a]
 16: (clone()+0x6d) [0x7f33664cd31d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2014-10-15_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/550325

Actions #1

Updated by Yuri Weinstein over 9 years ago

Observed similar crash in suite:upgrade:dumpling
Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/
Job http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/

 *** Caught signal (Aborted) **
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: in thread 7fd2df9fb700
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz:
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: ceph version 0.67.11-24-g06a62db (06a62dbb552981f86e58d83c58eeb254d633e74a)
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 1: ceph-osd() [0x7fd04a]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 2: (()+0xfcb0) [0x7fd2f36c4cb0]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 3: (gsignal()+0x35) [0x7fd2f19070d5]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 4: (abort()+0x17b) [0x7fd2f190a83b]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd2f225969d]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 6: (()+0xb5846) [0x7fd2f2257846]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 7: (()+0xb5873) [0x7fd2f2257873]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 8: (()+0xb596e) [0x7fd2f225796e]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8c654f]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x38c7) [0x60daa7]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x619) [0x6fedc9]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x330) [0x651ee0]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 13: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x478) [0x668b88]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 14: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6a464c]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b6b56]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 16: (ThreadPool::WorkThread::entry()+0x10) [0x8b8b70]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 17: (()+0x7e9a) [0x7fd2f36bce9a]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: 18: (clone()+0x6d) [0x7fd2f19c531d]
/a/teuthology-2014-10-18_17:00:02-upgrade:dumpling-dumpling-distro-basic-vps/556256/remote/vpm112/log/ceph-osd.4.log.gz: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Won't Fix

this is very rare and complicated to fix

Actions

Also available in: Atom PDF