Bug #666
closedOSD: ReplicatedPG::eval_repop
0%
Description
On my 'noisy' machine I wanted to see how fast writing a 10GB file would go, but this caused some crashes.
This machine hosts 4 OSD's, each with a 200MB journal on a tmpfs.
root@noisy:~# dd if=/dev/zero bs=1024k count=10240|pv > /mnt/ceph/10GB.bin 9.8GB 0:06:47 [31.5MB/s] [ <=>
The write stalled there and didn't go any further.
Around 3 ~ 4GB the OSD's started to crash, eventually 3 out of 4 OSD's crashed with the same backtrace:
2010-12-21 10:38:51.771608 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0) async snap create 'snap_5545' transid 79 got 0 Success 2010-12-21 10:38:51.771673 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0) waiting for transid 79 to complete 2010-12-21 10:38:51.771699 7fe3de3a4700 filestore(/var/lib/ceph/osd.0) queue_op 0x44986c0 seq 5546 1306 bytes (queue has 1 ops and 1306 bytes) 2010-12-21 10:38:51.771731 7fe3de3a4700 osd0 17 dequeue_op 0x413eb80 finish 2010-12-21 10:38:51.771747 7fe3de3a4700 osd0 17 dequeue_op osd_sub_op_reply(client4104.1:890 0.51 10000000000.00000373/head [] ondisk = 0) v1 pg pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean], 28 more pending 2010-12-21 10:38:51.771795 7fe3de3a4700 osd0 17 take_peer_stat peer osd1 stat(2010-12-21 10:38:51.014427 oprate=2.67588 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0) 2010-12-21 10:38:51.771835 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] repop_ack rep_tid 478 op osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]) result 0 ack_type 4 from osd1 2010-12-21 10:38:51.771863 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] eval_repop repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) wants=d 2010-12-21 10:38:51.771888 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] sending commit on repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) 0x43838c0 2010-12-21 10:38:51.771934 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) 2010-12-21 10:38:51.771959 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[])) 2010-12-21 10:38:51.771983 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) 2010-12-21 10:38:51.772004 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[])) osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)': osd/ReplicatedPG.cc:2147: FAILED assert(repop_queue.front() == repop) ceph version 0.24 (commit:180a4176035521940390f4ce24ee3eb7aa290632) 1: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x852) [0x484cb2] 2: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x168) [0x4865b8] 3: (ReplicatedPG::sub_op_modify_reply(MOSDSubOpReply*)+0x13c) [0x4868dc] 4: (OSD::dequeue_op(PG*)+0x122) [0x4e4172] 5: (ThreadPool::worker()+0x28f) [0x5dd40f] 6: (ThreadPool::WorkThread::entry()+0xd) [0x508e1d] 7: (Thread::_entry_func(void*)+0xa) [0x479baa] 8: (()+0x69ca) [0x7fe3e9d2e9ca] 9: (clone()+0x6d) [0x7fe3e89ba70d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
It's easily reproducible, just setup a similar cluster and start writing the 10GB file.
What I did notice is that the write speeds dropped pretty hard as soon as my journal filled up, but this is the same as #531 and #615 I assume?
I ran cdebugpack and uploaded the data to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_eval_repop. During the test, debug osd and filestore was at 20.
After restarting the crashed OSD's the cluster fully recovered, but the FS stayed unresponsive, a 'sync' stalled:
libceph: tid 2530 timed out on osd1, will reset osd
But the logs didn't show the OSD was stalled or something, it's logs showed it was running and doing stuff.
I then restarted osd1 and after a few minutes the 'sync' finished.
Updated by Wido den Hollander over 13 years ago
Forgot to mention, this was on the RC branch, commit: 180a4176035521940390f4ce24ee3eb7aa290632 (v0.24)
Updated by Sage Weil over 13 years ago
- Status changed from New to Resolved
- Assignee set to Sage Weil
Updated by Jim Schutt over 13 years ago
Hit another instance of this on current unstable (d386ea2ea0b).
I've got logs from debug osd = 20 + debug filestore = 20, if that helps.
This was 64 clients writing 16 GiB data total to 16 OSDs, w/ 4 OSDs/server.
Each OSD had a ~500 MiB journal partition.
Here's a gdb backtrace:
(gdb) bt #0 0x00007fdab48509dd in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41 #1 0x000000000065c4ce in handle_fatal_signal (signum=6) at config.cc:253 #2 <signal handler called> #3 0x00007fdab36bb265 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #4 0x00007fdab36bcd10 in abort () at abort.c:88 #5 0x00007fdab3f31cb4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #6 0x00007fdab3f2fdb6 in ?? () from /usr/lib64/libstdc++.so.6 #7 0x00007fdab3f2fde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 #8 0x00007fdab3f2feca in __cxa_throw () from /usr/lib64/libstdc++.so.6 #9 0x0000000000646e18 in ceph::__ceph_assert_fail (assertion=0x6803e0 "repop_queue.front() == repop", file=0x67f332 "osd/ReplicatedPG.cc", line=2496, func=0x682080 "void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)") at common/assert.cc:25 #10 0x00000000004c300e in ReplicatedPG::eval_repop (this=0x7fda94516c30, repop=0x7fdaa07d0e50) at osd/ReplicatedPG.cc:2496 #11 0x00000000004c3529 in ReplicatedPG::repop_ack (this=0x7fda94516c30, repop=0x7fdaa07d0e50, result=0, ack_type=4, fromosd=4, peer_lcod=...) at osd/ReplicatedPG.cc:2638 #12 0x00000000004c3960 in ReplicatedPG::sub_op_modify_reply (this=0x7fda94516c30, r=0x1cb6c20) at osd/ReplicatedPG.cc:3058 #13 0x0000000000505082 in OSD::dequeue_op (this=0x13c7460, pg=0x7fda94516c30) at osd/OSD.cc:5252 #14 0x0000000000648ed9 in ThreadPool::worker (this=0x13c7818) at common/WorkQueue.cc:44 #15 0x000000000054197d in ThreadPool::WorkThread::entry (this=<value optimized out>) at ./common/WorkQueue.h:113 #16 0x0000000000493877 in Thread::_entry_func (arg=0x31b7) at ./common/Thread.h:39 #17 0x00007fdab484873d in start_thread (arg=<value optimized out>) at pthread_create.c:301 #18 0x00007fdab375ef6d in clone () from /lib64/libc.so.6
Updated by Sage Weil over 13 years ago
- Status changed from Resolved to In Progress
Updated by Jim Schutt over 13 years ago
I have to use our managed transfer app for that; you should be getting an email about the transfer.
Let me know if you don't get it soon.
Updated by Jim Schutt over 13 years ago
FWIW, I seem to be able to reproduce this at will.
I've been trying to understand what I'm seeing in the logs - it seems like the op that is (incorrectly?) on the front of the repop_queue hasn't been committed at the time removal is attempted on the other op.
I think I've several sets of logs now where that seems to be the case.
Updated by Sage Weil over 13 years ago
- Target version changed from v0.24.1 to v0.24.2
Updated by Sage Weil over 13 years ago
- Status changed from In Progress to Resolved