Project

General

Profile

Actions

Bug #666

closed

OSD: ReplicatedPG::eval_repop

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On my 'noisy' machine I wanted to see how fast writing a 10GB file would go, but this caused some crashes.

This machine hosts 4 OSD's, each with a 200MB journal on a tmpfs.

root@noisy:~# dd if=/dev/zero bs=1024k count=10240|pv > /mnt/ceph/10GB.bin
 9.8GB 0:06:47 [31.5MB/s] [                                    <=> 

The write stalled there and didn't go any further.

Around 3 ~ 4GB the OSD's started to crash, eventually 3 out of 4 OSD's crashed with the same backtrace:

2010-12-21 10:38:51.771608 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0) async snap create 'snap_5545' transid 79 got 0 Success
2010-12-21 10:38:51.771673 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0)  waiting for transid 79 to complete
2010-12-21 10:38:51.771699 7fe3de3a4700 filestore(/var/lib/ceph/osd.0) queue_op 0x44986c0 seq 5546 1306 bytes   (queue has 1 ops and 1306 bytes)
2010-12-21 10:38:51.771731 7fe3de3a4700 osd0 17 dequeue_op 0x413eb80 finish
2010-12-21 10:38:51.771747 7fe3de3a4700 osd0 17 dequeue_op osd_sub_op_reply(client4104.1:890 0.51 10000000000.00000373/head [] ondisk = 0) v1 pg pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean], 28 more pending
2010-12-21 10:38:51.771795 7fe3de3a4700 osd0 17 take_peer_stat peer osd1 stat(2010-12-21 10:38:51.014427 oprate=2.67588 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-21 10:38:51.771835 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] repop_ack rep_tid 478 op osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]) result 0 ack_type 4 from osd1
2010-12-21 10:38:51.771863 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] eval_repop repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) wants=d
2010-12-21 10:38:51.771888 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean]  sending commit on repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) 0x43838c0
2010-12-21 10:38:51.771934 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]  removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]))
2010-12-21 10:38:51.771959 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]    q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[]))
2010-12-21 10:38:51.771983 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]  removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]))
2010-12-21 10:38:51.772004 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]    q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[]))
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)':
osd/ReplicatedPG.cc:2147: FAILED assert(repop_queue.front() == repop)
 ceph version 0.24 (commit:180a4176035521940390f4ce24ee3eb7aa290632)
 1: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x852) [0x484cb2]
 2: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x168) [0x4865b8]
 3: (ReplicatedPG::sub_op_modify_reply(MOSDSubOpReply*)+0x13c) [0x4868dc]
 4: (OSD::dequeue_op(PG*)+0x122) [0x4e4172]
 5: (ThreadPool::worker()+0x28f) [0x5dd40f]
 6: (ThreadPool::WorkThread::entry()+0xd) [0x508e1d]
 7: (Thread::_entry_func(void*)+0xa) [0x479baa]
 8: (()+0x69ca) [0x7fe3e9d2e9ca]
 9: (clone()+0x6d) [0x7fe3e89ba70d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

It's easily reproducible, just setup a similar cluster and start writing the 10GB file.

What I did notice is that the write speeds dropped pretty hard as soon as my journal filled up, but this is the same as #531 and #615 I assume?

I ran cdebugpack and uploaded the data to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_eval_repop. During the test, debug osd and filestore was at 20.

After restarting the crashed OSD's the cluster fully recovered, but the FS stayed unresponsive, a 'sync' stalled:

libceph:  tid 2530 timed out on osd1, will reset osd

But the logs didn't show the OSD was stalled or something, it's logs showed it was running and doing stuff.

I then restarted osd1 and after a few minutes the 'sync' finished.

Actions

Also available in: Atom PDF