Project

General

Profile

Actions

Bug #15829

closed

RBD MIRROR: assert fails in ImageReplayer, daemon aborts

Added by Jon Bernard almost 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -15> 2016-05-10 15:46:18.315359 7f87d02c9700  1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1158 ==== osd_op_reply(7085 rbd_mirroring [call] v0'0 uv10 ondisk = -2 ((2) No such file or directory)) v7 ==== 133+0+0 (1179139191 0 0) 0x7f87a4215100 con 0x7f87b804ba90
   -14> 2016-05-10 15:46:18.361126 7f87a37fe700  5 rbd::mirror::image_replayer::BootstrapRequest: 0x7f87bc002510 handle_open_remote_image: remote image is not primary -- skipping image replay
   -13> 2016-05-10 15:46:18.368207 7f87d0ccc700  5 rbd-mirror: ImageReplayer[0/101a238e1f29]::handle_bootstrap: remote image is non-primary or local image is primary
   -12> 2016-05-10 15:46:27.944120 7f87c97fa700 10 monclient: tick
   -11> 2016-05-10 15:46:37.945371 7f87c97fa700 10 monclient: tick
   -10> 2016-05-10 15:46:41.983970 7f87e0245c80  1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7086 0.b74753e6 rbd_mirroring [call rbd.mirror_mode_get] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x564df3b80720 con 0x7f87b804ba90
    -9> 2016-05-10 15:46:41.985352 7f87d02c9700  1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1159 ==== osd_op_reply(7086 rbd_mirroring [call] v0'0 uv10 ondisk = 0) v7 ==== 133+0+4 (4269282549 0 3211207553) 0x7f87a4215100 con 0x7f87b804ba90
    -8> 2016-05-10 15:46:41.985458 7f87e0245c80  1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7087 0.b74753e6 rbd_mirroring [call rbd.mirror_peer_list] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x564df3bad530 con 0x7f87b804ba90
    -7> 2016-05-10 15:46:41.986288 7f87d02c9700  1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1160 ==== osd_op_reply(7087 rbd_mirroring [call] v0'0 uv10 ondisk = 0) v7 ==== 133+0+87 (2619010202 0 1398442284) 0x7f87a4215100 con 0x7f87b804ba90
    -6> 2016-05-10 15:46:47.946763 7f87c97fa700 10 monclient: tick
    -5> 2016-05-10 15:46:48.314780 7f87a3fff700  1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7088 0.b74753e6 rbd_mirroring [call rbd.mirror_uuid_get] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x7f87bc0ac890 con 0x7f87b804ba90
    -4> 2016-05-10 15:46:48.316964 7f87d02c9700  1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1161 ==== osd_op_reply(7088 rbd_mirroring [call] v0'0 uv10 ondisk = 0) v7 ==== 133+0+40 (81791933 0 3248482275) 0x7f87a4215100 con 0x7f87b804ba90
    -3> 2016-05-10 15:46:48.318556 7f87a3fff700  1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7089 0.b74753e6 rbd_mirroring [call rbd.mirror_image_get_image_id] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x7f87bc0908f0 con 0x7f87b804ba90
    -2> 2016-05-10 15:46:48.319707 7f87d02c9700  1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1162 ==== osd_op_reply(7089 rbd_mirroring [call] v0'0 uv10 ondisk = -2 ((2) No such file or directory)) v7 ==== 133+0+0 (1179139191 0 0) 0x7f87a4215100 con 0x7f87b804ba90
    -1> 2016-05-10 15:46:48.322395 7f87a97fa700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f87bc0df720 handle_get_remote_tag_class: failed to retreive remote client: (2) No such file or directory
     0> 2016-05-10 15:46:48.334622 7f87d0ccc700 -1 tools/rbd_mirror/ImageReplayer.cc: In function 'void rbd::mirror::ImageReplayer<ImageCtxT>::on_start_fail_finish(int) [with ImageCtxT = librbd::ImageCtx]' thread 7f87d0ccc700 time 2016-05-10 15:46:48.322900
tools/rbd_mirror/ImageReplayer.cc: 408: FAILED assert(r == -4)

 ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x564de873201b]
 2: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5]
 3: (FunctionContext::finish(int)+0x1a) [0x564de8571eea]
 4: (Context::complete(int)+0x9) [0x564de8570729]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e]
 6: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740]
 7: (()+0x8182) [0x7f87d60aa182]
 8: (clone()+0x6d) [0x7f87d4a1b47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   1/ 1 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---
*** Caught signal (Aborted) **
 in thread 7f87d0ccc700 thread_name:tp_journal
 ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b)
 1: (()+0x344db2) [0x564de86fddb2]
 2: (()+0x10340) [0x7f87d60b2340]
 3: (gsignal()+0x39) [0x7f87d4957cc9]
 4: (abort()+0x148) [0x7f87d495b0d8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x564de87321f5]
 6: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5]
 7: (FunctionContext::finish(int)+0x1a) [0x564de8571eea]
 8: (Context::complete(int)+0x9) [0x564de8570729]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740]
 11: (()+0x8182) [0x7f87d60aa182]
 12: (clone()+0x6d) [0x7f87d4a1b47d]
2016-05-10 15:46:48.500155 7f87d0ccc700 -1 *** Caught signal (Aborted) **
 in thread 7f87d0ccc700 thread_name:tp_journal

 ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b)
 1: (()+0x344db2) [0x564de86fddb2]
 2: (()+0x10340) [0x7f87d60b2340]
 3: (gsignal()+0x39) [0x7f87d4957cc9]
 4: (abort()+0x148) [0x7f87d495b0d8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x564de87321f5]
 6: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5]
 7: (FunctionContext::finish(int)+0x1a) [0x564de8571eea]
 8: (Context::complete(int)+0x9) [0x564de8570729]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740]
 11: (()+0x8182) [0x7f87d60aa182]
 12: (clone()+0x6d) [0x7f87d4a1b47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2016-05-10 15:46:48.500155 7f87d0ccc700 -1 *** Caught signal (Aborted) **
 in thread 7f87d0ccc700 thread_name:tp_journal

 ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b)
 1: (()+0x344db2) [0x564de86fddb2]
 2: (()+0x10340) [0x7f87d60b2340]
 3: (gsignal()+0x39) [0x7f87d4957cc9]
 4: (abort()+0x148) [0x7f87d495b0d8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x564de87321f5]
 6: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5]
 7: (FunctionContext::finish(int)+0x1a) [0x564de8571eea]
 8: (Context::complete(int)+0x9) [0x564de8570729]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740]
 11: (()+0x8182) [0x7f87d60aa182]
 12: (clone()+0x6d) [0x7f87d4a1b47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   1/ 1 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---
Aborted
Actions #1

Updated by Jason Dillaman almost 8 years ago

  • Description updated (diff)
Actions #2

Updated by Jason Dillaman almost 8 years ago

  • Status changed from New to In Progress

Fix included with PR for issue #15909

Actions #3

Updated by Jason Dillaman almost 8 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF