Actions
Bug #15829
closedRBD MIRROR: assert fails in ImageReplayer, daemon aborts
Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-15> 2016-05-10 15:46:18.315359 7f87d02c9700 1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1158 ==== osd_op_reply(7085 rbd_mirroring [call] v0'0 uv10 ondisk = -2 ((2) No such file or directory)) v7 ==== 133+0+0 (1179139191 0 0) 0x7f87a4215100 con 0x7f87b804ba90 -14> 2016-05-10 15:46:18.361126 7f87a37fe700 5 rbd::mirror::image_replayer::BootstrapRequest: 0x7f87bc002510 handle_open_remote_image: remote image is not primary -- skipping image replay -13> 2016-05-10 15:46:18.368207 7f87d0ccc700 5 rbd-mirror: ImageReplayer[0/101a238e1f29]::handle_bootstrap: remote image is non-primary or local image is primary -12> 2016-05-10 15:46:27.944120 7f87c97fa700 10 monclient: tick -11> 2016-05-10 15:46:37.945371 7f87c97fa700 10 monclient: tick -10> 2016-05-10 15:46:41.983970 7f87e0245c80 1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7086 0.b74753e6 rbd_mirroring [call rbd.mirror_mode_get] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x564df3b80720 con 0x7f87b804ba90 -9> 2016-05-10 15:46:41.985352 7f87d02c9700 1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1159 ==== osd_op_reply(7086 rbd_mirroring [call] v0'0 uv10 ondisk = 0) v7 ==== 133+0+4 (4269282549 0 3211207553) 0x7f87a4215100 con 0x7f87b804ba90 -8> 2016-05-10 15:46:41.985458 7f87e0245c80 1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7087 0.b74753e6 rbd_mirroring [call rbd.mirror_peer_list] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x564df3bad530 con 0x7f87b804ba90 -7> 2016-05-10 15:46:41.986288 7f87d02c9700 1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1160 ==== osd_op_reply(7087 rbd_mirroring [call] v0'0 uv10 ondisk = 0) v7 ==== 133+0+87 (2619010202 0 1398442284) 0x7f87a4215100 con 0x7f87b804ba90 -6> 2016-05-10 15:46:47.946763 7f87c97fa700 10 monclient: tick -5> 2016-05-10 15:46:48.314780 7f87a3fff700 1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7088 0.b74753e6 rbd_mirroring [call rbd.mirror_uuid_get] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x7f87bc0ac890 con 0x7f87b804ba90 -4> 2016-05-10 15:46:48.316964 7f87d02c9700 1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1161 ==== osd_op_reply(7088 rbd_mirroring [call] v0'0 uv10 ondisk = 0) v7 ==== 133+0+40 (81791933 0 3248482275) 0x7f87a4215100 con 0x7f87b804ba90 -3> 2016-05-10 15:46:48.318556 7f87a3fff700 1 -- 172.16.0.20:0/2510431460 --> 172.16.0.20:6800/1848 -- osd_op(client.24138.0:7089 0.b74753e6 rbd_mirroring [call rbd.mirror_image_get_image_id] snapc 0=[] ack+read+known_if_redirected e21) v7 -- ?+0 0x7f87bc0908f0 con 0x7f87b804ba90 -2> 2016-05-10 15:46:48.319707 7f87d02c9700 1 -- 172.16.0.20:0/2510431460 <== osd.0 172.16.0.20:6800/1848 1162 ==== osd_op_reply(7089 rbd_mirroring [call] v0'0 uv10 ondisk = -2 ((2) No such file or directory)) v7 ==== 133+0+0 (1179139191 0 0) 0x7f87a4215100 con 0x7f87b804ba90 -1> 2016-05-10 15:46:48.322395 7f87a97fa700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f87bc0df720 handle_get_remote_tag_class: failed to retreive remote client: (2) No such file or directory 0> 2016-05-10 15:46:48.334622 7f87d0ccc700 -1 tools/rbd_mirror/ImageReplayer.cc: In function 'void rbd::mirror::ImageReplayer<ImageCtxT>::on_start_fail_finish(int) [with ImageCtxT = librbd::ImageCtx]' thread 7f87d0ccc700 time 2016-05-10 15:46:48.322900 tools/rbd_mirror/ImageReplayer.cc: 408: FAILED assert(r == -4) ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x564de873201b] 2: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5] 3: (FunctionContext::finish(int)+0x1a) [0x564de8571eea] 4: (Context::complete(int)+0x9) [0x564de8570729] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e] 6: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740] 7: (()+0x8182) [0x7f87d60aa182] 8: (clone()+0x6d) [0x7f87d4a1b47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 1/ 1 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 10000 max_new 1000 log_file --- end dump of recent events --- *** Caught signal (Aborted) ** in thread 7f87d0ccc700 thread_name:tp_journal ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b) 1: (()+0x344db2) [0x564de86fddb2] 2: (()+0x10340) [0x7f87d60b2340] 3: (gsignal()+0x39) [0x7f87d4957cc9] 4: (abort()+0x148) [0x7f87d495b0d8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x564de87321f5] 6: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5] 7: (FunctionContext::finish(int)+0x1a) [0x564de8571eea] 8: (Context::complete(int)+0x9) [0x564de8570729] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e] 10: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740] 11: (()+0x8182) [0x7f87d60aa182] 12: (clone()+0x6d) [0x7f87d4a1b47d] 2016-05-10 15:46:48.500155 7f87d0ccc700 -1 *** Caught signal (Aborted) ** in thread 7f87d0ccc700 thread_name:tp_journal ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b) 1: (()+0x344db2) [0x564de86fddb2] 2: (()+0x10340) [0x7f87d60b2340] 3: (gsignal()+0x39) [0x7f87d4957cc9] 4: (abort()+0x148) [0x7f87d495b0d8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x564de87321f5] 6: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5] 7: (FunctionContext::finish(int)+0x1a) [0x564de8571eea] 8: (Context::complete(int)+0x9) [0x564de8570729] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e] 10: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740] 11: (()+0x8182) [0x7f87d60aa182] 12: (clone()+0x6d) [0x7f87d4a1b47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2016-05-10 15:46:48.500155 7f87d0ccc700 -1 *** Caught signal (Aborted) ** in thread 7f87d0ccc700 thread_name:tp_journal ceph version 10.2.0-21-g791eba8 (791eba81a5467dd5de4f1680ed0deb647eb3fb8b) 1: (()+0x344db2) [0x564de86fddb2] 2: (()+0x10340) [0x7f87d60b2340] 3: (gsignal()+0x39) [0x7f87d4957cc9] 4: (abort()+0x148) [0x7f87d495b0d8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x564de87321f5] 6: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_start_fail_finish(int)+0x795) [0x564de85747d5] 7: (FunctionContext::finish(int)+0x1a) [0x564de8571eea] 8: (Context::complete(int)+0x9) [0x564de8570729] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x564de872386e] 10: (ThreadPool::WorkThread::entry()+0x10) [0x564de8724740] 11: (()+0x8182) [0x7f87d60aa182] 12: (clone()+0x6d) [0x7f87d4a1b47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 1/ 1 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 10000 max_new 1000 log_file --- end dump of recent events --- Aborted
Updated by Jason Dillaman almost 8 years ago
- Status changed from New to In Progress
Fix included with PR for issue #15909
Updated by Jason Dillaman almost 8 years ago
- Status changed from In Progress to Resolved
Actions