Project

General

Profile

Actions

Bug #666

closed

OSD: ReplicatedPG::eval_repop

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On my 'noisy' machine I wanted to see how fast writing a 10GB file would go, but this caused some crashes.

This machine hosts 4 OSD's, each with a 200MB journal on a tmpfs.

root@noisy:~# dd if=/dev/zero bs=1024k count=10240|pv > /mnt/ceph/10GB.bin
 9.8GB 0:06:47 [31.5MB/s] [                                    <=> 

The write stalled there and didn't go any further.

Around 3 ~ 4GB the OSD's started to crash, eventually 3 out of 4 OSD's crashed with the same backtrace:

2010-12-21 10:38:51.771608 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0) async snap create 'snap_5545' transid 79 got 0 Success
2010-12-21 10:38:51.771673 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0)  waiting for transid 79 to complete
2010-12-21 10:38:51.771699 7fe3de3a4700 filestore(/var/lib/ceph/osd.0) queue_op 0x44986c0 seq 5546 1306 bytes   (queue has 1 ops and 1306 bytes)
2010-12-21 10:38:51.771731 7fe3de3a4700 osd0 17 dequeue_op 0x413eb80 finish
2010-12-21 10:38:51.771747 7fe3de3a4700 osd0 17 dequeue_op osd_sub_op_reply(client4104.1:890 0.51 10000000000.00000373/head [] ondisk = 0) v1 pg pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean], 28 more pending
2010-12-21 10:38:51.771795 7fe3de3a4700 osd0 17 take_peer_stat peer osd1 stat(2010-12-21 10:38:51.014427 oprate=2.67588 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-21 10:38:51.771835 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] repop_ack rep_tid 478 op osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]) result 0 ack_type 4 from osd1
2010-12-21 10:38:51.771863 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] eval_repop repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) wants=d
2010-12-21 10:38:51.771888 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean]  sending commit on repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) 0x43838c0
2010-12-21 10:38:51.771934 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]  removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]))
2010-12-21 10:38:51.771959 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]    q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[]))
2010-12-21 10:38:51.771983 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]  removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]))
2010-12-21 10:38:51.772004 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean]    q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[]))
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)':
osd/ReplicatedPG.cc:2147: FAILED assert(repop_queue.front() == repop)
 ceph version 0.24 (commit:180a4176035521940390f4ce24ee3eb7aa290632)
 1: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x852) [0x484cb2]
 2: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x168) [0x4865b8]
 3: (ReplicatedPG::sub_op_modify_reply(MOSDSubOpReply*)+0x13c) [0x4868dc]
 4: (OSD::dequeue_op(PG*)+0x122) [0x4e4172]
 5: (ThreadPool::worker()+0x28f) [0x5dd40f]
 6: (ThreadPool::WorkThread::entry()+0xd) [0x508e1d]
 7: (Thread::_entry_func(void*)+0xa) [0x479baa]
 8: (()+0x69ca) [0x7fe3e9d2e9ca]
 9: (clone()+0x6d) [0x7fe3e89ba70d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

It's easily reproducible, just setup a similar cluster and start writing the 10GB file.

What I did notice is that the write speeds dropped pretty hard as soon as my journal filled up, but this is the same as #531 and #615 I assume?

I ran cdebugpack and uploaded the data to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_eval_repop. During the test, debug osd and filestore was at 20.

After restarting the crashed OSD's the cluster fully recovered, but the FS stayed unresponsive, a 'sync' stalled:

libceph:  tid 2530 timed out on osd1, will reset osd

But the logs didn't show the OSD was stalled or something, it's logs showed it was running and doing stuff.

I then restarted osd1 and after a few minutes the 'sync' finished.

Actions #1

Updated by Wido den Hollander over 13 years ago

Forgot to mention, this was on the RC branch, commit: 180a4176035521940390f4ce24ee3eb7aa290632 (v0.24)

Actions #2

Updated by Sage Weil over 13 years ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by Sage Weil over 13 years ago

  • Target version set to v0.24.1
Actions #4

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil
Actions #5

Updated by Jim Schutt over 13 years ago

Hit another instance of this on current unstable (d386ea2ea0b).

I've got logs from debug osd = 20 + debug filestore = 20, if that helps.

This was 64 clients writing 16 GiB data total to 16 OSDs, w/ 4 OSDs/server.
Each OSD had a ~500 MiB journal partition.

Here's a gdb backtrace:

(gdb) bt
#0  0x00007fdab48509dd in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1  0x000000000065c4ce in handle_fatal_signal (signum=6) at config.cc:253
#2  <signal handler called>
#3  0x00007fdab36bb265 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4  0x00007fdab36bcd10 in abort () at abort.c:88
#5  0x00007fdab3f31cb4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#6  0x00007fdab3f2fdb6 in ?? () from /usr/lib64/libstdc++.so.6
#7  0x00007fdab3f2fde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
#8  0x00007fdab3f2feca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#9  0x0000000000646e18 in ceph::__ceph_assert_fail (assertion=0x6803e0 "repop_queue.front() == repop", file=0x67f332 "osd/ReplicatedPG.cc", line=2496, 
    func=0x682080 "void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)") at common/assert.cc:25
#10 0x00000000004c300e in ReplicatedPG::eval_repop (this=0x7fda94516c30, repop=0x7fdaa07d0e50) at osd/ReplicatedPG.cc:2496
#11 0x00000000004c3529 in ReplicatedPG::repop_ack (this=0x7fda94516c30, repop=0x7fdaa07d0e50, result=0, ack_type=4, fromosd=4, peer_lcod=...) at osd/ReplicatedPG.cc:2638
#12 0x00000000004c3960 in ReplicatedPG::sub_op_modify_reply (this=0x7fda94516c30, r=0x1cb6c20) at osd/ReplicatedPG.cc:3058
#13 0x0000000000505082 in OSD::dequeue_op (this=0x13c7460, pg=0x7fda94516c30) at osd/OSD.cc:5252
#14 0x0000000000648ed9 in ThreadPool::worker (this=0x13c7818) at common/WorkQueue.cc:44
#15 0x000000000054197d in ThreadPool::WorkThread::entry (this=<value optimized out>) at ./common/WorkQueue.h:113
#16 0x0000000000493877 in Thread::_entry_func (arg=0x31b7) at ./common/Thread.h:39
#17 0x00007fdab484873d in start_thread (arg=<value optimized out>) at pthread_create.c:301
#18 0x00007fdab375ef6d in clone () from /lib64/libc.so.6
Actions #6

Updated by Sage Weil over 13 years ago

Can you post logs?

Thanks!

Actions #7

Updated by Sage Weil over 13 years ago

  • Status changed from Resolved to In Progress
Actions #8

Updated by Jim Schutt over 13 years ago

I have to use our managed transfer app for that; you should be getting an email about the transfer.

Let me know if you don't get it soon.

Actions #9

Updated by Jim Schutt over 13 years ago

FWIW, I seem to be able to reproduce this at will.

I've been trying to understand what I'm seeing in the logs - it seems like the op that is (incorrectly?) on the front of the repop_queue hasn't been committed at the time removal is attempted on the other op.

I think I've several sets of logs now where that seems to be the case.

Actions #10

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.24.1 to v0.24.2
Actions #11

Updated by Sage Weil over 13 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF